Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 1 month 2 weeks ago

Converting columns from one data type to another in PostgreSQL

Mon, 2019-07-08 00:19

Usually you should use the data type that best fits the representation of your data in a relational database. But how many times did you see applications that store dates or numbers as text or dates as integers? This is not so uncommon as you might think and fixing that could be quite a challenge as you need to cast from one data type to another when you want to change the data type used for a specific column. Depending on the current format of the data it might be easy to fix or it might become more complicated. PostgreSQL has a quite clever way of doing that.

Frequent readers of our blog might know that already: We start with a simple, reproducible test setup:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

What do we have here? A simple table with two columns: Column “a” is an integer and column “b” is of type text. For humans it seems obvious that the second column in reality contains a date but stored as text. What options do we have to fix that? We could try something like this:

postgres=# alter table t1 add column c date default (to_date('YYYYDDMM',b));
psql: ERROR:  cannot use column reference in DEFAULT expression

That obviously does not work. Another option would be to add another column with the correct data type, populate that column and then drop the original one:

postgres=# alter table t1 add column c date;
ALTER TABLE
postgres=# update t1 set c = to_date('YYYYMMDD',b);
UPDATE 3
postgres=# alter table t1 drop column b;
ALTER TABLE

But what is the downside of that? This will probably break the application as the column name changed and there is no way to avoid that. Is there a better way of doing that? Let’s start from scratch:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

The same setup as before. What other options do we have to convert "b" to a real date without changing the name of the column. Let's try the most obvious way and let PostgreSQL decide what to do:

postgres=# alter table t1 alter column b type date;
psql: ERROR:  column "b" cannot be cast automatically to type date
HINT:  You might need to specify "USING b::date".

This does not work as PostgreSQL in this case can not know how to go from one data type to another. But the “HINT” does already tell us what we might need to do:

postgres=# alter table t1 alter column b type date using (b::date);
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# 

For our data in the “b” column that does work. but consider you have data like this:

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'01-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (2,'02-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (3,'03-JAN-2019');
INSERT 0 1
postgres=# select * from t1;
 a |      b      
---+-------------
 1 | 01-JAN-2019
 2 | 02-JAN-2019
 3 | 03-JAN-2019
(3 rows)

Would that still work?

postgres=# alter table t1 alter column b type date using (b::date);;
ALTER TABLE
postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-02
 3 | 2019-01-03
(3 rows)

Yes, but in this case it will not:

DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'First--January--19');
INSERT 0 1
postgres=# insert into t1 values (2,'Second--January--19');
INSERT 0 1
postgres=# insert into t1 values (3,'Third--January--19');
INSERT 0 1
postgres=# select * from t1;
 a |          b           
---+---------------------
 1 | First--January--19
 2 | Second--January--19
 3 | Third--January--19
(3 rows)

postgres=# alter table t1 alter column b type date using (b::date);;
psql: ERROR:  invalid input syntax for type date: "First--January--19"
postgres=# 

As PostgreSQL has no idea how to do the conversion this will fail, no surprise here. But still you have the power of doing that by providing a function that does the conversion in exactly the way you want to have it:

create or replace function f_convert_to_date ( pv_text in text ) returns date
as $$
declare
begin
  return date('20190101');
end;
$$ language plpgsql;

Of course you would add logic to parse the input string so that the function will return the matching date and not a constant as in this example. For demonstration purposes we will go with this fake function:

postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-01
 3 | 2019-01-01
(3 rows)

… and here we go. The column was converted from text to date and we provided the exact way of doing that by calling a function that contains the logic to do that. As long as the output of the function conforms to the data type you want and you did not do any mistakes you can potentially go from any source data type to any target data type.

There is one remaining question: Will that block other sessions selecting from the table while the conversion is ongoing?

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 select a, '20190101' from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create index i1 on t1(a);
CREATE INDEX

In one session we will do the conversion and in the other session we will do a simple select that goes over the index:

-- first session
postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));

Second one at the same time:

-- second session
postgres=# select * from t1 where a = 1;
-- blocks

Yes, that will block, so you should plan such actions carefully when you have a busy system. But this is still better than adding a new column.

Cet article Converting columns from one data type to another in PostgreSQL est apparu en premier sur Blog dbi services.

Telling the PostgreSQL optimizer more about your functions

Sun, 2019-07-07 05:29

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

As usual let’s start with a little test setup:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select a,a::text,now() from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create unique index i1 on t1(a);
CREATE INDEX
postgres=# analyze t1;
ANALYZE

A simple table containing 1’000’000 rows and one unique index. In addition let’s create a simple function that will return exactly one row from that table:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql;

What is the optimizer doing when you call that function?

postgres=# explain (analyze) select f_tmp (1);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.654..0.657 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 0.696 ms
(4 rows)

We know that only one row will be returned but the optimizer is assuming that 1000 rows will be returned. This is the default and documented. So, no matter how many rows will really be returned, PostgreSQL will always estimate 1000. But you have some control and can tell the optimizer that the function will return one row only:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1;

Looking again at the execution plan again:

postgres=# explain (analyze) select f_tmp (1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..0.27 rows=1 width=32) (actual time=0.451..0.454 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.068 ms
 Execution Time: 0.503 ms
(4 rows)

Instead of 1000 rows we now do see that only 1 row was estimated which is what we specified when we created the function. Of course this is a very simple example and in reality you often might not be able to tell exactly how many rows will be returned from a function. But at least you can provide a better estimate as the default of 1000. In addition you can also specify a cost for your function (based on cpu_operator_cost):

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1
   cost 1;

If you use functions remember that you can give the optimizer more information and that there is a default of 1000.

Cet article Telling the PostgreSQL optimizer more about your functions est apparu en premier sur Blog dbi services.

SQL Server containers and docker network driver performance considerations

Fri, 2019-07-05 01:45

Few months ago I attended to the Franck Pachot session about Microservices and databases at SOUG Romandie in Lausanne on 2019 May 21th. He covered some performance challenges that can be introduced by Microservices architecture design and especially when database components come into the game with chatty applications. One year ago, I was in a situation where a customer installed some SQL Server Linux 2017 containers in a Docker infrastructure with user applications located outside of this infrastructure. It is likely an uncommon way to start with containers but anyway when you are immerging in a Docker world you just notice there is a lot of network drivers and considerations you may be aware of and just for a sake of curiosity, I proposed to my customer to perform some network benchmark tests to get a clear picture of these network drivers and their related overhead in order to design correctly Docker infrastructure from a performance standpoint.

The initial customer’s scenario included a standalone Docker infrastructure and we considered different approaches about application network configurations from a performance perspective. We did the same for the second scenario that concerned a Docker Swarm infrastructure we installed in a second step.

The Initial reference – Host network and Docker host network

The first point was to get an initial reference with no network management overhead directly from the network host. We used the iperf3 tool for the tests. This is a kind of tool I’m using with virtual environments as well to ensure network throughput is what we really expect and sometimes I got some surprises on this topic. So, let’s go back to the container world and each test was performed from a Linux host outside to the concerned Docker infrastructure according to the customer scenario.

The attached network card speed link of the Docker Host is supposed to be 10GBits/sec …

$ sudo ethtool eth0 | grep "Speed"
        Speed: 10000Mb/s

 

… and it is confirmed by the first iperf3 output below:

Let’s say that we tested the Docker host driver as well and we got similar results.

$ docker run  -it --rm --name=iperf3-server  --net=host networkstatic/iperf3 -s

 

Docker bridge mode

The default modus operandi for a Docker host is to create a virtual ethernet bridge (called docker0), attach each container’s network interface to the bridge, and to use network address translation (NAT) when containers need to make themselves visible to the Docker host and beyond. Unless specified, a docker container will use it by default and this is exactly the network driver used by containers in the context of my customer. In fact, we used user-defined bridge network but I would say it doesn’t matter for the tests we performed here.

$ ip addr show docker0
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:70:0a:e8:7a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:70ff:fe0a:e87a/64 scope link
       valid_lft forever preferred_lft forever

 

The iperf3 docker container I ran for my tests is using the default bridge network as show below. The interface with index 24 corresponds to the veth0bfc2dc peer of the concerned container.

$ docker run  -d --name=iperf3-server -p 5204:5201 networkstatic/iperf3 -s
…
$ docker ps | grep iperf
5c739940e703        networkstatic/iperf3              "iperf3 -s"              38 minutes ago      Up 38 minutes                0.0.0.0:5204->5201/tcp   iperf3-server
$ docker exec -ti 5c7 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

[clustadmin@docker1 ~]$ ethtool -S veth0bfc2dc
NIC statistics:
     peer_ifindex: 24

 

Here the output after running the iperf3 benchmark:

It’s worth noting that the “Bridge” network adds some overheads with an impact of 13% in my tests but in fact, it is an expected outcome to be honest and especially if we refer to the Docker documentation:

Compared to the default bridge mode, the host mode gives significantly better networking performance since it uses the host’s native networking stack whereas the bridge has to go through one level of virtualization through the docker daemon.

 

When the docker-proxy comes into play

Next scenario we wanted to test concerned the closet network proximity we may have between the user applications and the SQL Server containers in the Docker infrastructure. In other words, we assumed the application resides on the same host than the SQL Server container and we got some surprises from the docker-proxy itself.

Before running the iperf3 result, I think we have to answer to the million-dollar question here: what is the docker-proxy? But did you only pay attention to this process on your docker host? Let’s run a pstree command:

$ pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─agetty
        ├─auditd───{auditd}
        ├─containerd─┬─containerd-shim─┬─npm─┬─node───9*[{node}]
        │            │                 │     └─9*[{npm}]
        │            │                 └─12*[{containerd-shim}]
        │            ├─containerd-shim─┬─registry───9*[{registry}]
        │            │                 └─10*[{containerd-shim}]
        │            ├─containerd-shim─┬─iperf3
        │            │                 └─9*[{containerd-shim}]
        │            └─16*[{containerd}]
        ├─crond
        ├─dbus-daemon
        ├─dockerd─┬─docker-proxy───7*[{docker-proxy}]
        │         └─20*[{dockerd}]

 

Well, if I understand correctly the Docker documentation, the purpose of this process is to enable a service consumer to communicate with the service providing container …. but it’s only used in particular circumstances. Just bear in mind that controlling access to a container’s service is massively done through the host netfilter framework, in both NAT and filter tables and the docker-proxy mechanism is required only when this method of control is not available:

  • When the Docker daemon is started with –iptables=false or –ip-forward=false or when the Linux host cannot act as a router with Linux kernel parameter ipv4.ip_forward=0. This is not my case here.
  • When you are using localhost in the connection string of your application that implies to use the loopback interface (127.0.0.0/8) and the Kernel doesn’t allow routing traffic from it. Therefore, it’s not possible to apply netfilter NAT rules and instead, netfilter sends packets through the filter table’s INPUT chain to a local process listening on the docker-proxy
$ sudo iptables -L -n -t nat | grep 127.0.0.0
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

 

In the picture below you will notice I’m using the localhost key word in my connection string so the docker-proxy comes into play.

A huge performance impact for sure which is about 28%. This performance drop may be explained by the fact the docker-proxy process is consuming 100% of my CPUs:

The docker-proxy operates in userland and I may simply disable it with the docker daemon parameter – “userland-proxy”: false – but I would say this is a case we would not encounter in practice because applications will never use localhost in their connection strings. By the way, changing the connection string from localhost to the IP address of the host container gives a very different outcome similar to the Docker bridge network scenario.

 

Using an overlay network

Using a single docker host doesn’t fit well with HA or scalability requirements and in a mission-critical environment I strongly guess no customer will go this way. I recommended to my customer to consider using an orchestrator like Docker Swarm or K8s to anticipate future container workload that was coming from future projects. The customer picked up Docker Swarm for its easier implementation compared to K8s.

 

After implementing a proof of concept for testing purposes (3 nodes included one manager and two worker nodes), we took the opportunity to measure the potential overhead implied by the overlay network which is the common driver used by containers through stacks and services in such situation. Referring to the Docker documentation overlay networks manage communications among the Docker daemons participating in the swarm and used by services deployed on it. Here the docker nodes in the swarm infrastructure:

$ docker node ls
ID                            HOSTNAME                    STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
vvdofx0fjzcj8elueoxoh2irj *   docker1.dbi-services.test   Ready               Active              Leader              18.09.5
njq5x23dw2ubwylkc7n6x63ly     docker2.dbi-services.test   Ready               Active                                  18.09.5
ruxyptq1b8mdpqgf0zha8zqjl     docker3.dbi-services.test   Ready               Active                                  18.09.5

 

An ingress overlay network is created by default when setting up a swarm cluster. User-defined overlay network may be created afterwards and extends to the other nodes only when needed by containers.

$ docker network ls | grep overlay
NETWORK ID    NAME              DRIVER   SCOPE
ehw16ycy980s  ingress           overlay  swarm

 

Here the result of the iperf3 benchmark:

Well, the same result than the previous test with roughly 30% of performance drop. Compared to the initial reference, this is again an expected outcome but I didn’t imagine how important could be the impact in such case.  Overlay network introduces additional overhead by putting together behind the scene a VXLAN tunnel (virtual Layer 2 network on top of an existing Layer 3 infrastructure), VTEP endpoints for encapsulation/de-encapsulation stuff and traffic encryption by default.

Here a summary of the different scenarios and their performance impact:

Scenario Throughput (GB/s) Performance impact Host network 10.3 Docker host network 10.3 Docker bridge network 8.93 0.78 Docker proxy 7.37 0.71 Docker overlay network 7.04 0.68

 

In the particular case of my customer where SQL Server instances sit on the Docker infrastructure and applications reside outside of it, it’s clear that using directly Docker host network may be a good option from a performance standpoint assuming this infrastructure remains simple with few SQL Server containers. But in this case, we have to change the SQL Server default listen port with MSSQL_TCP_PORT parameter because using Docker host networking doesn’t provide port mapping capabilities. According to our tests, we didn’t get any evidence of performance improvement in terms of application response time between Docker network drivers but probably because those applications are not network bound here. But I may imagine scenarios where it can be. Finally, this kind of scenario encountered here is likely uncommon and I see containerized apps with database components outside the Docker infrastructure more often but it doesn’t change the game at all and the same considerations apply here … Today I’m very curious to test real microservices scenarios where database and application components are all sitting on a Docker infrastructure.

See you!

 

Cet article SQL Server containers and docker network driver performance considerations est apparu en premier sur Blog dbi services.

Modifying pg_hba.conf from inside PostgreSQL

Sat, 2019-06-29 07:14

During one of the sessions from the last Swiss PGDay there was a question which could not be answered during the talk: Is it possible to modify pg_hba.conf from inside PostgreSQL without having access to the operating system? What everybody agreed on is, that there currently is no build-in function for doing this.

When you are on a recent version of PostgreSQL there is a view you can use to display the rules in pg_hba.conf:

postgres=# select * from pg_hba_file_rules ;
 line_number | type  |   database    | user_name |  address  |                 netmask                 | auth_method | options | error 
-------------+-------+---------------+-----------+-----------+-----------------------------------------+-------------+---------+-------
          84 | local | {all}         | {all}     |           |                                         | trust       |         | 
          86 | host  | {all}         | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          88 | host  | {all}         | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          91 | local | {replication} | {all}     |           |                                         | trust       |         | 
          92 | host  | {replication} | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          93 | host  | {replication} | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          94 | host  | {all}         | {mydb}    | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
(7 rows)

But there is nothing which allows you to directly modify that. When you are lucky and you have enough permissions there is a way to do it, though. First, lets check where pg_hba.conf is located:

postgres=# select setting from pg_settings where name like '%hba%';
           setting           
-----------------------------
 /u02/pgdata/DEV/pg_hba.conf

Having that information we can load that file to a table:

postgres=# create table hba ( lines text ); 
CREATE TABLE
postgres=# copy hba from '/u02/pgdata/DEV/pg_hba.conf';
COPY 93

Once it is loaded we have the whole content in our table (skipping the comments and empty lines here):

postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
(6 rows)

As this is a normal table we can of course add a row:

postgres=# insert into hba (lines) values ('host  all mydb  ::1/128                 trust');
INSERT 0 1
postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
 host  all mydb  ::1/128                 trust
(7 rows)

And now we can write it back:

postgres=# copy hba to '/u02/pgdata/DEV/pg_hba.conf';
COPY 94

Reading the whole file confirms that our new rule is there:

postgres=# select pg_read_file('pg_hba.conf');
                               pg_read_file                               
--------------------------------------------------------------------------
 # PostgreSQL Client Authentication Configuration File                   +
 # ===================================================                   +
 #                                                                       +
 # Refer to the "Client Authentication" section in the PostgreSQL        +
 # documentation for a complete description of this file.  A short       +
 # synopsis follows.                                                     +
 #                                                                       +
 # This file controls: which hosts are allowed to connect, how clients   +
 # are authenticated, which PostgreSQL user names they can use, which    +
 # databases they can access.  Records take one of these forms:          +
 #                                                                       +
 # local      DATABASE  USER  METHOD  [OPTIONS]                          +
 # host       DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostssl    DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostnossl  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 #                                                                       +
 # (The uppercase items must be replaced by actual values.)              +
 #                                                                       +
 # The first field is the connection type: "local" is a Unix-domain      +
 # socket, "host" is either a plain or SSL-encrypted TCP/IP socket,      +
 # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a     +
 # plain TCP/IP socket.                                                  +
 #                                                                       +
 # DATABASE can be "all", "sameuser", "samerole", "replication", a       +
 # database name, or a comma-separated list thereof. The "all"           +
 # keyword does not match "replication". Access to replication           +
 # must be enabled in a separate record (see example below).             +
 #                                                                       +
 # USER can be "all", a user name, a group name prefixed with "+", or a  +
 # comma-separated list thereof.  In both the DATABASE and USER fields   +
 # you can also write a file name prefixed with "@" to include names     +
 # from a separate file.                                                 +
 #                                                                       +
 # ADDRESS specifies the set of hosts the record matches.  It can be a   +
 # host name, or it is made up of an IP address and a CIDR mask that is  +
 # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that     +
 # specifies the number of significant bits in the mask.  A host name    +
 # that starts with a dot (.) matches a suffix of the actual host name.  +
 # Alternatively, you can write an IP address and netmask in separate    +
 # columns to specify the set of hosts.  Instead of a CIDR-address, you  +
 # can write "samehost" to match any of the server's own IP addresses,   +
 # or "samenet" to match any address in any subnet that the server is    +
 # directly connected to.                                                +
 #                                                                       +
 # METHOD can be "trust", "reject", "md5", "password", "scram-sha-256",  +
 # "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert".    +
 # Note that "password" sends passwords in clear text; "md5" or          +
 # "scram-sha-256" are preferred since they send encrypted passwords.    +
 #                                                                       +
 # OPTIONS are a set of options for the authentication in the format     +
 # NAME=VALUE.  The available options depend on the different            +
 # authentication methods -- refer to the "Client Authentication"        +
 # section in the documentation for a list of which options are          +
 # available for which authentication methods.                           +
 #                                                                       +
 # Database and user names containing spaces, commas, quotes and other   +
 # special characters must be quoted.  Quoting one of the keywords       +
 # "all", "sameuser", "samerole" or "replication" makes the name lose    +
 # its special character, and just match a database or username with     +
 # that name.                                                            +
 #                                                                       +
 # This file is read on server startup and when the server receives a    +
 # SIGHUP signal.  If you edit the file on a running system, you have to +
 # SIGHUP the server for the changes to take effect, run "pg_ctl reload",+
 # or execute "SELECT pg_reload_conf()".                                 +
 #                                                                       +
 # Put your actual configuration here                                    +
 # ----------------------------------                                    +
 #                                                                       +
 # If you want to allow non-local connections, you need to add more      +
 # "host" records.  In that case you will also need to make PostgreSQL   +
 # listen on a non-local interface via the listen_addresses              +
 # configuration parameter, or via the -i or -h command line switches.   +
                                                                         +
 # CAUTION: Configuring the system for local "trust" authentication      +
 # allows any local user to connect as any PostgreSQL user, including    +
 # the database superuser.  If you do not trust all your local users,    +
 # use another authentication method.                                    +
                                                                         +
                                                                         +
 # TYPE  DATABASE        USER            ADDRESS                 METHOD  +
                                                                         +
 # "local" is for Unix domain socket connections only                    +
 local   all             all                                     trust   +
 # IPv4 local connections:                                               +
 host    all             all             127.0.0.1/32            trust   +
 # IPv6 local connections:                                               +
 host    all             all             ::1/128                 trust   +
 # Allow replication connections from localhost, by a user with the      +
 # replication privilege.                                                +
 local   replication     all                                     trust   +
 host    replication     all             127.0.0.1/32            trust   +
 host    replication     all             ::1/128                 trust   +
 host  all mydb  ::1/128                 trust                           +
(1 row)

All you need to do from now on is to reload the configuration and you’re done:

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Of course: Use with caution!

Cet article Modifying pg_hba.conf from inside PostgreSQL est apparu en premier sur Blog dbi services.

Windocks and K8s support

Fri, 2019-06-28 00:33

I got recently the 4.08 update from the Windocks team and I was very excited to evaluate some of new features. The first cool one I want to present in this blog concerns the Kubernetes support for deploying Windocks containers that will make my application deployment definitely easier. Let’s say you want to deploy your application that is tied to a Windocks container for SQL Server. In a previous blog post I explained why we are using Windocks in our context. So, with previous versions of Windocks, we had to write custom scripts to deploy applications on K8s that are tied to a Windocks. With the new version 4.08, this process may be simplified because both of applications and their related Windocks containers are directly deployable on K8s by using a YAML deployment file.

In fact, the new way consists in deploying a Windocks SQL Server proxy on K8s that works in conjunction with a Windocks Server. Once the SQL Server proxy deployed a corresponding Windocks container is spinning up with their specific parameters as shown in the picture below:

 

First of all, in order to make access secure between K8s and the Windocks Server authentication is required and we need to provide credential information that will be stored in the sql-proxy secret in K8s. SA password is also included in this secret and will be used to setup the SA account when the Windocks container will spin up.

$ kubectl create secret generic proxy-secrets --from-literal=WINDOCKS_REQUIRED_USERNAME='clustadmin' --from-literal=WINDOCKS_REQUIRED_PASSWORD='StrongPassword' --from-literal=WINDOCKS_REQUIRED_CONTAINER_SAPASSWORD=’sa_password'

 

The next step consists in deploying the Windocks SQL proxy by with the specific environment variables including WINDOCKS_REQUIRED_HOSTNAME (Windocks server name or IP Address), WINDOCKS_REQUIRED_IMAGE_NAME (Windocks based image used for container) and WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT (optional).

  • The Windocks SQL Proxy YAML file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: windocks-sql-proxy-secure 
  labels:
    app: sqlproxy-secure 
spec:
  replicas: 1 
  template:
    metadata:
      labels:
        app: sqlproxy-secure 
        tier: frontend
    spec:
      containers:
      - name: sqlproxy-secure-app 
        image: windocks/windocks-sql-server-proxy 
        imagePullPolicy: Always
        ports:
        - name: tcp-proxy
          containerPort: 3087
        - name: tls-proxy
          containerPort: 3088
        envFrom:
          - secretRef:
              name: proxy-secrets
        env:
          - name: PROJECT_ID
            value: project_id_for_GKE_deployment_optional
          - name: WINDOCKS_REQUIRED_HOSTNAME
            value: xx.xxx.xxx.xxx
          - name: WINDOCKS_REQUIRED_IMAGE_NAME
            value: 2012_ci
          - name: WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT
            value: "3087"

 

If we want to make the SQL Proxy pod accessible from outside a service is needed but this is not mandatory according to the context. Note that you may also use TLS connection to secure the network between K8s and the Windocks server.

  • The Windocks service YAML file
apiVersion: v1
kind: Service
metadata:
  name: windocks-sql-proxy-secure
  labels:
    app: sqlproxy-secure
    tier: frontend
spec:
  sessionAffinity: ClientIP
  type: LoadBalancer
  ports:
  - port: 3087
    name: tcp-proxy-secure-service
    targetPort: 3087
  - port: 3088
    name: tls-proxy-secure-service
    targetPort: 3088
  selector:
    app: sqlproxy-secure
    tier: frontend

 

Let’s give a try on my Azure infrastructure including an AKS cluster and a Windocks Server installed in an Azure VM. I also took the opportunity to create my own helm chart from the YAML files provided by the Windocks team. It will make my deployment easier for sure. Here the command I used to deploy my Windocks helm chart on my AKS cluster.

$ helm install --name windocks2012 --namespace dmk --set Windocks.Image=2012_ci --set Windocks.Port=3089 --set Windocks.PortSSL=3090 .

 

Deployment will be performed in a specific namespace named dmk and the 2012_ci image will be used as based image for my Windocks container. I will be able to connect to my Windocks container by using the 3089 port through the SQL Proxy deployed on K8s. After few seconds the following resources were deployed within my dmk namespace including a Windocks SQL Proxy pod and the Windocks SQL Proxy service.

$ kubectl get all -n dmk
NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fb8694m   1/1     Running   0          13m

NAME                                                            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)
                 AGE
service/backend                                                 ClusterIP      10.0.126.154   <none>          80/TCP
                 8d
service/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   LoadBalancer   10.0.252.235   xx.xx.xxx.xxx   3089:30382/TCP,3090:30677/TCP   44m

NAME                                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   1         1         1            1           44m

NAME                                                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fbdb5c96   1         1         1       44m

 

Once deployed, the SQL proxy will redirect all connections from 3089 port to the container port after spinning up the corresponding Windocks container on the Windocks server. We may get some details by taking a look at the SQL Proxy logs on K8s. As a reminder the container port is allocated dynamically by default by the Windocks server and the SQL proxy get it automatically for connection redirection.

…
Valid response for creating Windocks container
Container id is b1201aaaba3b4cd047953b624e541e26500024e42e6381936fc7b526b5596a99
Container port is 10001
Setting up tcp server
redirecting connections from 127.0.0.1:3089 to xx.xxx.xxx.xxx:10001 
…

 

Let’s try to connect by using mssql-cli and the external IP of the SQL Proxy service and the 3089 port. The connection redirect is effective and I can interact with my Windocks container on local port 10001:

master> SELECT top 1 c.local_net_address, c.local_tcp_port
....... FROM sys.dm_exec_connections as c; 
+---------------------+------------------+
| local_net_address   | local_tcp_port   |
|---------------------+------------------|
| 172.18.0.5          | 10001            |
+---------------------+------------------+

 

The Windocks container for SQL Server was spinning up my 3 testing databases as expected:

master> \ld+
+-------------------+-------------------------+-----------------------+------------------------------+
| name              | create_date             | compatibility_level   | collation_name               |
|-------------------+-------------------------+-----------------------+------------------------------|
| master            | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| tempdb            | 2019-06-27 20:04:04.273 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| model             | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| msdb              | 2012-02-10 21:02:17.770 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| AdventureWorksDbi | 2019-06-27 20:04:03.537 | 100                   | Latin1_General_100_CS_AS     |
| ApplixEnterprise  | 2019-06-27 20:04:04.477 | 90                    | SQL_Latin1_General_CP1_CI_AS |
| dbi_tools         | 2019-06-27 20:04:05.153 | 100                   | French_CS_AS                 |
+-------------------+-------------------------+-----------------------+------------------------------+

 

From the Windocks server, I may get a picture of provisioned containers. The interesting one in our case is referenced by the name k8s-windocks2012/xxxx:

PS F:\WINDOCKS\SQL2012> docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
e9dbe5556b2f        2012_ci             ""                  29 minutes ago      Stopped             10002/              dab/Windocks-id:31432367-c744-4ae3-8248-cb3fb3d2792e
b1201aaaba3b        2012_ci             ""                  13 minutes ago      Started             10001/              k8s-windocks2012/Windocks-id:cfa58c38-d168-4c04-b4c8-12b0552b93ad

 

Well, in a nutshell a feature we will consider to integrate in our DevOps Azure pipeline for sure. Stay tuned, other blog posts will come later.

See you!

 

 

 

Cet article Windocks and K8s support est apparu en premier sur Blog dbi services.

work Agile in a GxP-regulated environment

Thu, 2019-06-27 09:24

On 4 June 2019 I followed an invitation to

wega-it’s Know-how & Networking Breakfast 2 2019 on “Agile Validation in GxP Projects”.

So they were to discuss Agility in the context of GxP regulation.

I had some earlier exposure to various kinds of compliance roles and topics, and my current work environment is in the highly regulated Pharma industry. So I was really wondering (and hoping for learning) how you can possibly bring the two points of view of Agility and GxP regulation together. The Literaturhaus Basel was to see some very special type of literature that day. Not a formal presentation but a role play performance between these two viewpoints, represented by Evelyne Daniel, an experienced GxP validation expert, and Mathias Fuchs, an equally experienced Scrum master, both from wega-IT. A very nice idea, very appropriate for the topic!

What is Compliance (GxP) and what is Agile?

Typically in GxP compliance we work along the so-called V-model. In its rigidness and wanted plannability of course it corresponds largely to the waterfall model of traditional software development. Opposed to this the Agile Manifesto (2001) criticizes the very basics of these traditional ways of working. Remember the iconic claims like “Working software over comprehensive Documentation” and “Responding to change over following a Plan”. But how would you ever get acceptance in the compliance world without full documentation and planning!?! When I quickly browsed the internet, I found a multitude of statements and proposals which would certainly merit a separate blog post. For this time, I will give a quick summary of the wega breakfast presentation and some touch points with our current working environment in the Pharma industry.

Although in my current work environment we are not actually driving GxP Validation projects, we are still subject to the very tight GxP regulation. In the processes of Change and Release Management, this is reflected in the rigid rules of testing and documentation, to just name the most obvious examples. Background, of course, is the definition of Compliance and its Validation: the goal is to “establish documented evidence” to assure compliance and quality etc. These requirements hold independently of the quality, completeness or even up-to-date status of the pre-defined processes and rules! Call this inflexible and cumbersome! Any adaptation (update!) of the processes and rules is very formal through the complicated administrative processes to be used and hence very slow. Consider this in our fast-moving (not only IT-) world!

What is an MVP?

A nice play of words was interjected in the stage discussion: the acronym MVP has a very clear meaning as a basic concept for both sides, just it is not the same: MVP = Master Validation Plan (in GxP Validation) versus Minimal Viable Product (in Agile or Lean Software Development).

How to bring them together?

Now how to bring the core aspirations of Agile Development like Customer focus, Flexibility, Speed into the Compliance world? A first inevitable step in the V-model world: break up the (dead?) lock between a complete finalization of User Requirements Specification and the setup of a complete Validation Plan prescribing all Qualification criteria (IQ, OQ, PQ). Definition of Done (DoD) plays a major role when trying to cut the end-to-end Development-Validation elephant into smaller pieces. Inclusion of Validation into the “daily” Development activities is another must, instead of adding Validation at the end of Development phases only. Yet another core principle from the Agile side is the ensurance of team Maturity and Mindset. Much-hailed Diversity is opposed to pure compliance-oriented expert teams, striving for innovation and creativity in the team.

WEGA breakfast - Agile Validation in GxP projects

Some basic approaches

The final answer on how to – methodically – combine or maybe rather “emulsify” Agility and Compliance Validation comes as no surprise: there is no one-size-fits-all method. Rather three obvious basic approaches were presented.

  1. introducing Agility right between the left (Specifications) and the right (Qualifications) arms of the V-model, probably using some kind of piloting or prototyping
  2. including Validation into the Agile Development, almost doing Validation in each Agile sprint
  3. appending V-model Validation at the end of an Agile development.

The above-mentioned end-to-end Development-to-Validation elephant has to be broken into smaller better manageable units. Each specific project situation will have its own possible and best way to do it.

Think innovative and creative!

Thanks to wega-informatik (www.wega-it.com)  for organizing this creative and informative event.

 

Cet article work Agile in a GxP-regulated environment est apparu en premier sur Blog dbi services.

Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles

Wed, 2019-06-26 10:23

Pour une fois un blog en français parce que concernant un événement Oracle en Français.
Aujourd’hui chez Oracle Suisse à Genève, il y’avait une présentation concernant la sécurité Oracle.
Il s’agissait de se mettre à la place d’un Hacker pour mieux appréhender leurs démarches et ensuite comment protéger les données dans une base de données Oracle.
Comprendre les intentions et moyens utilisés par les Hackers pour parvenir à leurs fins nous aide à mieux les combattre.

La séance était animée par Hakim Loumi – EMEA DB Security PM
Dans un premier temps le conférencier a montré une estimation de l’accroissement des données d’ici 2025. Waw 175 Zb

Et comment ces données pouvaient être sensitives

Impressionnant le nombre d’attaques dans le monde

Hakim a aussi montré pourquoi le Hacking était devenu un de fléaux les plus importants. Dans ce slide ci-dessous pour un investissement de moins de moins de 10$, une identité complète revenait en moyenne à 240$. Quelle rentabilité !!!!!

Evidemment la base de données est une cible principale des attaques

Le conférencier a ensuite présenté les principaux outils fournis par Oracle, principalement nécessitant l’option Advanced Security Option

Data Masking
Data Redaction
Oracle Firewall
Database vault

Et pour terminer sur ce joli slide

Conclusion

L’évenement étatit vraiment intéressant. Le conférencier par des exemples, des anecdotes simples a su capter l’attention du public. On retiendra surtout dans cette présentation que la sécurité des données, n’est pas que l’affaire du DBA. En effet c’est toute une chaines de procédures incluant tout le monde dans l’entreprise.

Cet article Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (8): Sub-partitioning

Tue, 2019-06-18 04:19

We are slowly coming to the end of this little series about partitioning in PostgreSQL. In the last post we had a look at indexing and constraints and today we will have a look at sub partitioning. Sub partitioning means you go one step further and partition the partitions as well. Although it is not required to read all the posts of this series to follow this one: If you want, here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4) : Hash partitioning
  5. PostgreSQL partitioning (5): Partition pruning
  6. PostgreSQL partitioning (6): Attaching and detaching partitions
  7. PostgreSQL partitioning (7): Indexing and constraints

Coming back to our range partitioned table this is how it looks like currently:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_default DEFAULT

Lets assume that you expect that traffic violations will grow exponentially in 2022 because more and more cars will be on the road and when there will be more cars there will be more traffic violations. To be prepared for that you do not only want to partition by year but also by month. In other words: Add a new partition for 2022 but sub partition that by month. First of all you need a new partition for 2022 that itself is partitioned as well:

create table traffic_violations_p_2022
partition of traffic_violations_p
for values from ('2022-01-01') to ('2023-01-01') partition by range(date_of_stop);

Now we can add partitions to the just created partitioned partition:

create table traffic_violations_p_2022_jan
partition of traffic_violations_p_2022
for values from ('2022-01-01') to ('2022-02-01');

create table traffic_violations_p_2022_feb
partition of traffic_violations_p_2022
for values from ('2022-02-01') to ('2022-03-01');

create table traffic_violations_p_2022_mar
partition of traffic_violations_p_2022
for values from ('2022-03-01') to ('2022-04-01');

create table traffic_violations_p_2022_apr
partition of traffic_violations_p_2022
for values from ('2022-04-01') to ('2022-05-01');

create table traffic_violations_p_2022_may
partition of traffic_violations_p_2022
for values from ('2022-05-01') to ('2022-06-01');

create table traffic_violations_p_2022_jun
partition of traffic_violations_p_2022
for values from ('2022-06-01') to ('2022-07-01');

create table traffic_violations_p_2022_jul
partition of traffic_violations_p_2022
for values from ('2022-07-01') to ('2022-08-01');

create table traffic_violations_p_2022_aug
partition of traffic_violations_p_2022
for values from ('2022-08-01') to ('2022-09-01');

create table traffic_violations_p_2022_sep
partition of traffic_violations_p_2022
for values from ('2022-09-01') to ('2022-10-01');

create table traffic_violations_p_2022_oct
partition of traffic_violations_p_2022
for values from ('2022-10-01') to ('2022-11-01');

create table traffic_violations_p_2022_nov
partition of traffic_violations_p_2022
for values from ('2022-11-01') to ('2022-12-01');

create table traffic_violations_p_2022_dec
partition of traffic_violations_p_2022
for values from ('2022-12-01') to ('2023-01-01');

Looking at psql’s output when we describe the partitioned table not very much changed, just the keyword “PARTITIONED” is showing up beside our new partition for 2022:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'), PARTITIONED,
            traffic_violations_p_default DEFAULT

The is where the new functions in PostgreSQL 12 become very handy:

postgres=# select * from pg_partition_tree('traffic_violations_p');
             relid             |        parentrelid        | isleaf | level 
-------------------------------+---------------------------+--------+-------
 traffic_violations_p          |                           | f      |     0
 traffic_violations_p_default  | traffic_violations_p      | t      |     1
 traffic_violations_p_2013     | traffic_violations_p      | t      |     1
 traffic_violations_p_2014     | traffic_violations_p      | t      |     1
 traffic_violations_p_2015     | traffic_violations_p      | t      |     1
 traffic_violations_p_2016     | traffic_violations_p      | t      |     1
 traffic_violations_p_2017     | traffic_violations_p      | t      |     1
 traffic_violations_p_2018     | traffic_violations_p      | t      |     1
 traffic_violations_p_2019     | traffic_violations_p      | t      |     1
 traffic_violations_p_2020     | traffic_violations_p      | t      |     1
 traffic_violations_p_2021     | traffic_violations_p      | t      |     1
 traffic_violations_p_2022     | traffic_violations_p      | f      |     1
 traffic_violations_p_2022_jan | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_feb | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_mar | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_apr | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_may | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jun | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jul | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_aug | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_sep | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_oct | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_nov | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_dec | traffic_violations_p_2022 | t      |     2

To verify if data is routed correctly to the sub partitions let’s add some data for 2022:

insert into traffic_violations_p (date_of_stop)
       select * from generate_series ( date('01-01-2022')
                                     , date('12-31-2022')
                                     , interval '1 day' );

If we did the partitioning correctly we should see data in the new partitions:

postgres=# select count(*) from traffic_violations_p_2022_nov;
 count 
-------
    30
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_dec;
 count 
-------
    31
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_feb;
 count 
-------
    28
(1 row)

Here we go. Of course you could go even further and sub-partition the monthly partitions further by day or week. You can also partition by list and then sub-partition the list partitions by range. Or partition by range and then sub-partition by list, e.g.:

postgres=# create table traffic_violations_p_list_dummy partition of traffic_violations_p_list for values in ('dummy') partition by range (date_of_stop);
CREATE TABLE
postgres=# create table traffic_violations_p_list_dummy_2019 partition of traffic_violations_p_list_dummy for values from ('2022-01-01') to ('2023-01-01');
CREATE TABLE
postgres=# insert into traffic_violations_p_list (seqid, violation_type , date_of_stop) values (-1,'dummy',date('2022-12-01'));
INSERT 0 1
postgres=# select date_of_stop,violation_type from traffic_violations_p_list_dummy_2019;
 date_of_stop | violation_type 
--------------+----------------
 2022-12-01   | dummy
(1 row)

That’s it for sub-partitioning. In the final post we will look at some corner cases with partitioning in PostgreSQL.

Cet article PostgreSQL partitioning (8): Sub-partitioning est apparu en premier sur Blog dbi services.

Connecting to a Repository via a Dynamically Edited dfc.properties File (part II)

Sun, 2019-06-16 13:37

This is part II of the 2-part article. See for part I of this article.

Testing

We will test on the host machine named docker that hosts 2 containers, container01 and container011. All 3 machines run a repository. Its name is respectively dmtest on docker (shortly, dmtest@docker:1489), dmtest01@container01:1489 (dmtest01@container01:2489 externally) and dmtest01@container011:1489 (dmtest01@container011:5489 externally). Incidentally, the enhanced syntax is also a good way to uniquely identify the repositories.
The current dfc.properties file on the host docker:

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489

This is used for the local docbase dmtest.
Let’s tag all the docbases for an easy identification later:

$ iapi dmtest -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest on docker host VM
save,c,l
eoq

Idem from within container01 with its default dfc.properties file:

$ iapi dmtest01 -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest01 created silently on container01
save,c,l
eoq

Idem from within container011:

$ iapi dmtest01 -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest01 created silently on container011
save,c,l
eoq

First, let's access container01.dmtest01 from the containers' host VM with the current dfc.properties file:

$ idql dmtest01 -Udmadmin -Pdmadmin
 
 
OpenText Documentum idql - Interactive document query interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest01
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:1489) does not know of a server for the specified docbase (dmtest01)"

As expected, it does not work because container01.dmtest01 does not project to the host’s docbroker. Now, let’s turn to widql:

$ ./widql dmtest01@docker:2489 -Udmadmin -Pdmadmin --keep <<eoq
select title from dm_docbase_config
go
eoq
OpenText Documentum idql - Interactive document query interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800011bb started for user dmadmin."
 
 
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
------------------------------------------
dmtest01 created silently on container01

It works.
We used ––keep, therefore the dfc.properties file has changed:

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=2489

Indeed.
That docbase can also be reached by the container’s IP address and internal port 1489:

$ docker exec -it container01 ifconfig eth0 | head -3
eth0: flags=4163 mtu 1500
inet 192.168.33.101 netmask 255.255.255.0 broadcast 192.168.33.255
ether 02:42:c0:a8:21:65 txqueuelen 0 (Ethernet)
 
$ ./widql dmtest01@192.168.33.101:1489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800011b5 started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container01

Is the local dmtest docbase still reachable ?:

$ idql dmtest -Udmadmin -Pdmadmin
...
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Not with that changed dfc.properties file and the standard tools. But by using our nifty little tool:

$ ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
title
----------------------
dmtest on host VM

Fine !
Is container011.dmtest01 reachable now ?

$ ./widql dmtest01 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
...
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
-------------------------------------------
dmtest01 created silently on container01

This is container01.dmtest01, not the one we want, i.e. the one on container011.
Note that ./widql was called without the extended syntax so it invoked the standard idql directly.
Let try again:

$ ./widql dmtest01@docker:5489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c3508000059e started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container011

Here we go, it works !
The same using the container’s IP address and its docbroker’s internal port:

$ docker exec -it container011 ifconfig eth0 | head -3
eth0: flags=4163 mtu 1500
inet 192.168.33.104 netmask 255.255.255.0 broadcast 192.168.33.255
ether 02:42:c0:a8:21:68 txqueuelen 0 (Ethernet)
 
$ ./widql dmtest01@192.168.33.104:5489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080000598 started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container011

Try now the same connection but with ––append and ––keep:

$ ./widql dmtest01@docker:5489 -Udmadmin -Pdmadmin --append --keep <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
...
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
-------------------------------------------
dmtest01 created silently on container011

What is the content of dfc.properties now ?

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=2489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=5489

Both options have been taken into account as expected.
Let’s try to reach the VM host’s repository:

$ ./widql dmtest -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Specify the docbroker’s host and the ––verbose option:

$ ./widql dmtest@docker -Udmadmin -Pdmadmin --verbose <<eoq
select title from dm_docbase_config
go
eoq
 
changing to docker:...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=2489
diffs:
12,13d11
< dfc.docbroker.host[1]=docker
< dfc.docbroker.port[1]=5489
calling original: /app/dctm/product/16.4/bin/idql dmtest -Udmadmin -Pdmadmin
...
Connecting to Server using docbase dmtest
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Since the port was not specified, the wrapper took the first port found in the dfc.properties to supply the missing value, i.e. 2489 which is incorrect as dmtest@docker only projects to port docker:1489.
Use an unambiguous command now:

$ ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin ––verbose <<eoq
select title from dm_docbase_config
go
eoq
 
changing to docker:1489...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=1489
diffs:
11,13c11
< dfc.docbroker.port[0]=2489
< dfc.docbroker.host[1]=docker
dfc.docbroker.port[0]=1489
calling original: /app/dctm/product/16.4/bin/idql dmtest -Udmadmin -Pdmadmin
...
Connecting to Server using docbase dmtest
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
title
--------------------
dmtest on host VM

Looks OK.
Let’s try wdmawk now. But first, here is the test code twdmawk.awk:

$ cat twdmawk.awk 
BEGIN {
   print "repo_target=" repo_target, "docbase=" docbase
   session = dmAPIGet("connect," docbase ",dmadmin,dmadmin")
   print dmAPIGet("getmessage," session)
   dmAPIGet("retrieve," session ",dm_docbase_config")
   print dmAPIGet("get," session ",l,title")
   dmAPIExec("disconnect," session)
   exit(0)
}

Line 3 displays the two variables automatically passed to dmawk by the wrapper, repo_target and docbase.
The test script connects to the docbase which was silently passed as command-line parameter by wdmawk through the -v option after it extracted it from the given target parameter docbase[@host[:port]], as illustrated below with the ––verbose option.
Let’s see the invocation for the repository on the host VM:

$ ./wdmawk dmtest@docker:1489 -f ./twdmawk.awk --verbose
changing to docker:1489...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=1489
diffs:
11,13c11
< dfc.docbroker.port[0]=2489
< dfc.docbroker.host[1]=docker
––
> dfc.docbroker.port[0]=1489
calling original: /app/dctm/product/16.4/bin/dmawk -v docbase=dmtest -f ./twdmawk.awk
repo_target= docbase=dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c3508000367b started for user dmadmin."
 
 
dmtest on host VM

Let’s acces the container01’s repository :

$ ./wdmawk dmtest01@docker:2489 -f ./twdmawk.awk
 
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080001202 started for user dmadmin."
 
 
dmtest01 created silently on container01

A small typo in the port number and …

dmadmin@docker:~$ ./wdmawk dmtest01@docker:3489 -f ./twdmawk.awk
 
[DFC_DOCBROKER_REQUEST_FAILED] Request to Docbroker "docker:3489" failed
 
[DM_SESSION_E_RPC_ERROR]error: "Server communication failure"
 
java.net.ConnectException: Connection refused (Connection refused)

Note the stupid error message “… Connection refused …”, very misleading when investigating a problem. It’s just that there nobody listening on that port.
Let’s access the container011’s repository:

dmadmin@docker:~$ ./wdmawk dmtest01@docker:5489 -f ./twdmawk.awk
 
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800005ef started for user dmadmin."
 
 
dmtest01 created silently on container011

Effect of the -v option:

dmadmin@docker:~$ ./wdmawk -v dmtest01@docker:5489 -f ./twdmawk.awk --verbose
...
calling original: /app/dctm/product/16.4/bin/dmawk -v repo_target=dmtest@docker:1489 -v docbase=dmtest -f ./twdmawk.awk
repo_target=dmtest@docker:1489 docbase=dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080003684 started for user dmadmin."
 
 
dmtest on host VM

A repo_target parameter with the extended syntax has been passed to dmawk.
Let’s now quickly check the wrapper from within the containers.
Container01
The host’s docbase:

[dmadmin@container01 scripts]$ ./wiapi dmtest@docker:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest
...
dmtest on host VM

The container011’s docbase:

[dmadmin@container01 scripts]$ ./wiapi dmtest01@container011:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest01
...
dmtest01 created silently on container011
...

Container011
The host’s docbase:

dmadmin@container011 scripts]$ ./wiapi dmtest@docker:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
...
dmtest on host VM
...

The docbase on container01:

dmadmin@container011 scripts]$ ./wiapi dmtest01@container01:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
...
Connecting to Server using docbase dmtest01
...
dmtest01 created silently on container01
...

Let’s briefly test the usage of the sourced configuration file. Here is a snippet of the file shown earlier in this article:

# repository connection configuration file;
# must be sourced prior so the environment variables can be resolved;
# this is a enhancement over the dfc.properties file syntax used by the dctm_wrapper utility:
# docbroker.host[i]=...
# docbroker.port[i]=...
# it supports several syntaxes:
# docbroker only definition [[docbroker_host]:[port]];
#    usage: ./widql dmtest@$dmtest
# full definition docbase[@[docbroker_host]:[port]]
#    usage: ./widql $test
# alternate ':' separator docbase:[[docbroker_host]:[docroker_port]];
#    usage: ./widql $dmtestVM
# alias literal;
#    usage: ./widql test
# in order to resolve alias literals, the wrapper will source the configuration file by itself;
...
# container011.dmtest01;
# docbroker only definition docbroker_host:port;
d_dmtest011=container011:5489
di_dmtest011=192.168.33.104:1489
# full definition;
f_dmtest011=dmtest01@container011:2489
fip_dmtest011=dmtest01@192.168.33.104:1489

With a good name convention, the variables can be easily remembered which saves a lot of typing too.
Note on lines 9 and 10 how the whole extended target name can be specified, including the repository name.
A few tests:

dmadmin@docker:~$ ./widql dmtest01@$d_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@container011:5489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql dmtest01@$dip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@192.168.33.104:1489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql $f_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@container011:2489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql $fip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@192.168.33.104:1489 -Udmadmin -Pdmadmin --verbose] ...

The variables have been expanded by the shell prior to entering the wrapper, no programming effort was needed here, which is always appreciated.

Possible Enhancements

As shown precedently, the alternate configuration file lists aliases for the couples docbroker:port and even repository@docbroker:port. In passing, the wrapper also supports the version repository:docbroker:port.
Now, in order to better match Documentum syntax, is it possible to be even more transparent by removing dollar signs, colons and at-signs while still accessing the extended syntax ? E.g.:

$ ./widql dmtest -Udmadmin ....

Yes it is. The trick here is to first look up the alias in the configuration file, which incidentally becomes mandatory now, and re-execute the program with the alias resolved. As we are all lazy coders, we will not explicitly code the looking up but instead rely on the shell: the wrapper will source the file, resolve the target and re-execute itself.
If the alias has not been defined in the file, then the wrapper considers it as the name of a repository and falls back to the usual command-line tools.
A good thing is that no new format has to be introduced in the file as the target is still the name of an environment variable.
Since the changes are really minimal, let’s do it. Hereafter, the diff output showing the changes from the listing in part I:

> # this variable points to the target repositories alias file and defaults to repository_connections.aliases;
> REPO_ALIAS=${REPO_ALIAS:-~/repository_connections.aliases}
> 
107a111
> [[ bVerbose -eq 1 ]] && echo "current configuration file=[$REPO_ALIAS]"
225,227c229,241
<    if [[ $bVerbose -eq 1 ]]; then
<       echo "no change to current $dfc_config file"
    [[ -f $REPO_ALIAS ]] && . $REPO_ALIAS
>    definition=${!1}
>    [[ $bVerbose -eq 1 ]] && echo "alias lookup in $REPO_ALIAS: $1 = $definition"
>    if [[ ! -z $definition ]]; then
>       new_cmd=${current_cmd/$1/$definition}
>       [[ $bVerbose -eq 1 ]] && echo "invoking $new_cmd"
>       exec $new_cmd
>    else
>       if [[ $bVerbose -eq 1 ]]; then
>          echo "no change to current $dfc_config file"
>          echo "calling original: $DM_HOME/bin/${dctm_program} $*"
>       fi
>       $DM_HOME/bin/${dctm_program} $*
229d242
<    $DM_HOME/bin/${dctm_program} $*

On line 9, the target configuration file pointed to by the REPO_ALIAS environment variable gets sourced if existing. $REPO_ALIAS defaults to repository_connections.aliases but can be changed before calling the wrapper.
Note on line 10 how bash can dereference a variable 1 containing the name of another variable 2 to get variable 2’s value (indirect expansion), nice touch.
To apply the patch in-place, save the diffs above in diff-file and run the following command:

patch old-file < diff-file

Testing
For conciseness, the tests below only show how the target is resolved. The actual connection has already been tested abundantly earlier.

dmadmin@docker:~$ ./widql f_dmtest -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql f_dmtest -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: f_dmtest = dmtest@docker:1489
invoking ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest@docker:1489 -Udmadmin -Pdmadmin --verbose] ...
dmadmin@docker:~$ ./widql fip_dmtest01 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql fip_dmtest01 -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: fip_dmtest01 = dmtest01@192.168.33.2:1489
invoking ./widql dmtest01@192.168.33.2:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest01@192.168.33.2:1489 -Udmadmin -Pdmadmin --verbose] ...
dmadmin@docker:~$ ./widql fip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql fip_dmtest011 -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: fip_dmtest011 = dmtest01@192.168.33.3:1489
invoking ./widql dmtest01@192.168.33.3:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest01@192.168.33.3:1489 -Udmadmin -Pdmadmin --verbose]

Note how the targets are cleaner now, no curly little fancy shell characters in front.

Conclusion

As I was testing this little utility, I was surprised to realize how confortable and natural its usage is. It feels actually better to add the docbroker’s host and port than to stop at the docbase name, probably because it makes the intented repository absolutely unambiguous. The good thing is that is almost invisible, except for its invocation but even this can be smoothed out by using command aliases or renaming the symlinks.
When one has to work with identically named docbases or with clones existing in different environments, dctm-wrapper brings a real relief. And it was quick and easy to put together too.
As it modifies an essential configuration file, it is mainly aimed at developers or administrators on their machine, but then those constitute the targeted audience anyway.
As always, if you have any ideas for some utility that could benefit us all, please do no hesitate to suggest them in the comment section. Feedback is welcome too of course.

Cet article Connecting to a Repository via a Dynamically Edited dfc.properties File (part II) est apparu en premier sur Blog dbi services.

Connecting to a Repository via a Dynamically Edited dfc.properties File (part I)

Sun, 2019-06-16 13:36
Connecting to a Repository via a Dynamically Edited dfc.properties File

Now that we have containerized content servers, it is very easy, maybe too easy, to create new repositories. Their creation is still not any faster (whether they are containerized or not is irrelevant here) but given a configuration file it just takes one command to instantiate an image into a running container with working repositories in it. Thus, during experimentation and testing, out of laziness or in a hurry, one can quickly finish up having several containers with identically named repositories, e.g. dmtest01, with an identically named docbroker, e.g. docbroker01. Now, suppose one wants to connect to the docbase dmtest01 running on the 3rd such container using the familiar command-line tools idql/iapi/dmawk. How then to select that particular instance of dmtest01 among all the others ?
To precise the test case, let’s say that we are using a custom bridge network to link the containers together on the docker host (appropriately named docker) which is a VirtualBox VM running an Ubuntu flavor. The metal also runs natively the same Ubuntu distro. It looks complicated but actually matches the common on-premises infrastructure type where the metal is an ESX or equivalent, its O/S is the hypervisor and the VMs run a Redhat or Suse distro. As this is a local testing environment, no DNS or network customizations have been introduced save for the custom bridge.
We want to reach a remote repository either from container to container or from container to host or from host to container.
The problem here stems from the lack of flexibility in the docbroker/dfc.properties file mechanism and no network fiddling can work around this.

It’s All in The dfc.properties File

Containers have distinct host names, so suffice it to edit their local dfc.properties file and edit this field only. Their file may all look like the one below:

dfc.docbroker.host[0]=container01
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=1489
dfc.docbroker.host[3]=container011
dfc.docbroker.port[3]=1489
dfc.docbroker.host[4]=container02
dfc.docbroker.port[4]=1489

In effect, the custom bridge network embeds a DNS for all the attached containers, so their host names are known to each other (but not to the host so IP address must be used from there or the host’s /etc/hosts file must be edited). The docbroker ports are the ones inside the containers and have all the same value 1489 because they were created out of the same configuration files. The docker entry has been added to the containers’ /etc/host file via the ––add-host= clause of the docker run’s command.
For the containers’ host machine, where a Documentum repository has been installed too, the dfc.properties file could look like this one:

dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=2489
dfc.docbroker.host[3]=docker
dfc.docbroker.port[3]=3489
dfc.docbroker.host[4]=docker
dfc.docbroker.port[4]=5489

Here, the host name is the one of the VM where the containers sit and is the same for all the containers. The port numbers differ because they are the external container’s port which are published in the host VM and mapped to the respective docbroker’s internal port, 1489. Since the containers share the same custom network, their host names, IP addresses and external ports must all be different when running the image, or docker won’t allow it.
Alternatively, the container’s IP addresses and internal docbroker’s ports could be used directly too if one is too lazy to declare the containers’ host names in the host’s /etc/hosts file, which is generally the case when testing:

dfc.docbroker.host[0]=docker 
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=192.168.33.101
dfc.docbroker.port[1]=1489
dfc.docbroker.host[2]=192.168.33.102
dfc.docbroker.port[2]=1489
dfc.docbroker.host[3]=192.168.33.104
dfc.docbroker.port[3]=1489

The host’s custom network will take care of routing the traffic into the respective containers.
Can you spot the problem now ? As all the containers contain identically named repositories (for clarity, let’s say that we are looking for the docbase dmtest01), the first contacted docbroker in that file will always reply successfully because there is indeed a dmtest01 docbase in that container and consequently one will always be directed to the docbase container01.dmtest01. If one wants to contact container03.dmtest01, this configuration won’t let do it. One would need to edit it and move the target container03 host in the first position, which is OK until one wants to access container02.dmtest01 or go back to container01.dmtest01.
This situation has been existing forever but containers make it more obvious because they make it so much easier to have repository homonyms.
So is there a simpler way to work around this limitation than editing back and forth a configuration file or giving different names to the containerized repositories ?

A Few Reminders

Documentum has made quite a lot of design decisions inspired by the Oracle DBMS but their implementation is far from offering the same level of flexibility and power, and this is often irritating. Let’s consider the connectivity for example. Simply speaking, Oracle’s SQL*Net configuration relies mainly on a tnsnames.ora file for the connectivity (it can also use a centralized ldap server but let’s keep it simple). This file contains entries used to contact listeners and get the information needed to connect to the related database. Minimal data to provide in the entries are the listener’s hostname and port, and the database sid or service name, e.g.:

...
ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db_service)
    )
  )
...

A connection to the database db_service can simply be requested as follows:

sqlplus scott@orcl

orcl is the SQL*Net alias for the database served by db_service. It works like an index in a lookup table, the tnsnames.ora file.
Compare this with a typical dfc.properties file, e.g. /home/dmadmin/documentum/shared/config/dfc.properties:

...
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=dmtest
dfc.docbroker.port[1]=1489
...

Similarly, instead of contacting listeners, we have here docbrokers. A connection to the docbase dmtest can be requested as follows:

idql dmtest

dmtest is the target repository. It is not a lookup key in the dfc.properties file. Unlike the tnsnames.ora file and its aliases, there is an indirection here and the dfc.properties file does not directly tell where to find a certain repository, it just lists the docbrokers to be sequentially queried about it until the first one that knows the repository (or an homonym thereof) answers. If the returned target docbase is the wrong homonym, tough luck, it will not be reachable, unless the order of the entries is changed. Repositories announces themselves to the docbrokers by “projecting” themselves. If two repositories by the same name project to the same docbroker, no error is raised but the docbroker can return unexpected results, e.g. one may finish up in the unintended docbase.
Another major difference is that with Oracle but not with Documentum, it is possible to bypass the tnsnames.ora file by specifying the connection data in-line, e.g. on the command-line:

sqlplus scott@'(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db_service)
    )
  )'

This can be very useful when editing the local, official listener.ora file is not allowed, and sometimes faster than setting $TNS_ADMIN to an accessible local directory and editing a private listener.ora file there.
This annoyance is even more frustrating because Documentum’s command-line tools do support a similar syntax but for a different purpose:

idql repository[.service][@machine] [other parameters]

While this syntax is logically useful to access the service (akin to an Oracle’s instance but for a HA Documentum installation), it is used in a distributed repository environment to contact a particular node’s docbroker; however, it still does not work if that docbroker is not first declared in the local dfc.properties file.
Last but not least, one more reason to be frustrated is that the DfCs do allow to choose a specific docbroker when opening a session, as illustrated by the jython snippet below:

import traceback
import com.documentum.fc.client as DFCClient
import com.documentum.fc.common as DFCCommon

docbroker_host = "docker"
docbroker_port = "1489"
docbase = "dmtest"
username = "dmadmin"
password = "dmadmin"
print("attempting to connect to " + docbase + " as " + username + "/" + password + " via docbroker on host " + docbroker_host + ":" + docbroker_port)
try:
  client = DFCClient.DfClient.getLocalClient()

  config = client.getClientConfig()
  config.setString ("primary_host", docbroker_host)
  config.setString ("primary_port", docbroker_port)

  logInfo = DFCCommon.DfLoginInfo()
  logInfo.setUser(username)
  logInfo.setPassword(password)
  docbase_session = client.newSession(docbase, logInfo)

  if docbase_session is not None:
    print("Connected !")
  else:
    print("Couldn't connect !")
except Exception:
  traceback.print_exc()

Content of dfc.properties:

$ cat documentum/shared/config/dfc.properties
dfc.date_format=dd.MM.yyyy HH:mm:ss

Execution:

$ jython ./test.jy
...
attempting to connect to dmtest as dmadmin/dmadmin via docbroker docker
Connected !

Despite a dfc.properties file devoid of any docbroker definition, the connection was successful. Unfortunately, this convenience has not been carried over to the vegetative command-line tools.
While we can dream and hope for those tools to be resurrected and a backport miracle to happen (are you listening OTX ?), the next best thing is to tackle ourselves this shortcoming and implement an as unobtrusive as possible solution. Let’s see how.

A few Proposals

Currently, one has to manually edit the local dfc.properties file, but this is tedious to say the least, because changes must sometimes be done twice, forwards and rolled back if the change is only temporary. To avoid this, we could add at once in our local dfc.properties file all the machines that host repositories of interest but this file could quickly grow large and it won’t solve the case of repository homonyms. The situation would become quite unmanageable although an environment variable such as the late DMCL_CONFIG (appropriately revamped e.g. to DFC_PROPERTIES_CONFIG for the full path name of the dfc.properties file to use) could help here to organize those entries. But there is not such a variable any longer for the command-line tools (those tools have stopped evolving since CS v6.x) although there is a property for the DfCs to pass to the JVM at startup, -Ddfc.properties.file, or even the #include clause in the dfc.properties file, or playing with the $CLASSPATH but there is a better way.
What about an on-the-fly, transparent, behind the scenes dfc.properties file editing to support a connection syntax similar to the Oracle’s in-line one ?
Proposal 1
Let’s specify the address of the docbroker of interest directly on the command-line, as follows:

$ idql dmtest01@container03:3489
or
$ idql dmtest01@192.168.33.104:3489

This is more akin to Oracle in-line connection syntax above.
Proposal 2
An alternative could be to use an Oracle’s tnsnames.ora-like configuration file such as the one below (and (in (keeping (with (the (lisp spirit)))))):

dmtest01 = ((docbroker.host = container01) (docbroker.port = 1489))
dmtest02 = ((docbroker.host = container02) (docbroker.port = 1489))
dmtest03 = ((docbroker.host = container03) (docbroker.port = 1489))

and to use it thusly:

$ idql dmtest01@dmtest03

dmtest03 is looked up in the configuration file and replaced on the command-line by its definition.
Proposal 3
With a more concise configuration file that can also be sourced:

dmtest01=container01:1489
dmtest02=container02:1489
dmtest03=container03:1489

and used as follows:

$ export REPO_ALIAS=~/repository_connections.aliases
$ . $REPO_ALIAS
$ ./widql dmtest01@$dmtest03

$dmtest03 is directly fetched from the environment after the configuration file has been sourced, which is equivalent to a lookup. Since the variable substitution occurs at the shell level, it comes free of charge.
With a bit more generalization, it is possible to merge the three proposals together:

$ idql repository(@host_literal:port_number) | @$target

In other words, one can either provide literally the full connection information or provide a variable which will be resolved by the shell from a configuration file to be sourced preliminarily.
Let’s push the configuration file a bit farther and define complete aliases up to the repository name like this:

dmtest=dmtest@docker:1489
or even so:
dmtest=dmtest:docker:1489

Usage:

$ ./widql $dmtest

The shell will expand the alias with its definition. The good thing is the definition styles can be mixed and matched to suit one’s fantasy. Example of a configuration file:

# must be sourced prior so the environment variables can be resolved;
# this is a enhancement over the dfc.properties file syntax used by the dctm_wrapper utility:
# docbroker.host[i]=...
# docbroker.port[i]=...
# it supports several syntaxes:
# docbroker only definition docbroker_host:port;
#    usage: ./widql dmtest@$dmtest
# full definition docbase[@[docbroker_host]:[port]]
#    usage: ./widql $test
# alternate ':' separator docbase:[docbroker_host]:[port];
#    usage: ./widql $dmtestVM
# alias literal;
#    usage: ./widql test
# in order to resolve alias literals, the wrapper will source the configuration file by itself;

# docker.dmtest;
# docbroker only definition;
d_dmtest=docker:1489
# full definition;
f_dmtest=dmtest@docker:1489
# alternate ':' separator;
a_dmtest=dmtest:docker:1489

# container01.dmtest01;
# docbroker only definition;
d_dmtest01=container01:2489
dip_dmtest01=192.168.33.101:1489
# full definition;
f_dmtest01=dmtest01@container01:2489
fip_dmtest01c=dmtest01@192.168.33.101:1489
# alternate ':' separator;
a_dmtest01=dmtest01:container01:2489
aip_dmtest01=dmtest01:192.168.33.101:2489

# container011.dmtest01;
# docbroker only definition;
d_dmtest011=container011:5489
dip_dmtest011=192.168.33.104:1489
# full definition;
f_dmtest011=dmtest01@container011:2489
fip_dmtest011=dmtest01@192.168.33.104:1489
# alternate ':' separator;
a_dmtest011=dmtest01:container011:2489
aip_dmtest011=dmtest01:192.168.33.104:2489

Lines 5 to 14 explains all the supported target syntaxes with a new one presented on lines 12 to 14, which will be explained later in the paragraph entitled Possible Enhancements.
Using lookup variables in a configuration file makes things easier when the host names are hard to remember because better mnemonic aliases can be defined for them. Also, as they are looked up, the entries can be in any order. They must obviously be unique or they will mask each other. A consistent naming convention may be required to easily find one own’s way into this file.
Whenever the enhanced syntax is used, it triggers an automatic editing of the dfc.properties file and the specified connection information is inserted as dfc.docbroker.host and dfc.docbroker.port entries. Then, the corresponding Documentum tool gets invoked and finally the original dfc.properties file is restored when the tool exits. The trigger here is the presence of the @ or : characters in the first command-line parameter.
This would also cover the case when an entry is simply missing from the dfc.properties file. Actually, from the point of view of the command-line tools, all the connection definitions could be handled over to the new configuration file and even removed from dfc.properties as they are dynamically added to and deleted from the latter file as needed.

The Implementation

The above proposal looks pretty easy and fun to implement, so let’s give it a shot. In this article, I’ll present a little script, dctm_wrapper, that builds upon the above @syntax to first edit the configuration file on demand (that’s the dynamic part of the article’s title) and then invoke the standard idql, iapi or dmawk utilities, with an optional rollback of the change on exiting.
Since it is not possible to bypass the dfc.properties files, we will dynamically modify it whenever the @host syntax is used from a command-line tool. As we do no want to replace the official idql, iapi and dmawk tools, yet, we will create new ones, say widql, wiapi and wdmawk (where w stands for wrapper). Those will be symlinks to the real script, dctm-wrapper.sh, which will be able to invoke either idql, iapi or dmawk according to how it was called (bash’s $0 contains the name of the symlink that was invoked, even though its target is always dctm-wrapper.sh, see the script’s source at the next paragraph).
The script dctm-wrapper.sh will support the following syntax:

$ ./widql docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep] $ ./wiapi docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep] $ ./wdmawk [-v] docbase[@[host][:port]] [dmawk parameters] [--verbose] [--append] [--keep]

The custom parameters ––verbose, ––append and ––keep are processed by the script and stripped off before invoking the official tools.
wdmawk is a bit special in that the native tool, dmawk, is invoked differently from iapi/idql but I felt that it too could benefit from this little hack. Therefore, in addition to the non-interactive editing of the dfc.properties file, wdmawk also passes on the target docbase name as a -v docbase=… command-line parameter (the standard way to pass parameters in awk) and removes the extended target parameter docbase[@[host][:port]] unless it is prefixed by the -v option in which case it gets forwarded through the -v repo_target= parameter. The dmawk program is then free to use them the way it likes. The repo_target parameter could have been specified on the command-line independently but the -v option can still be useful in cases such as the one below:

$ ./wdmawk docbase@docker:1489 -v repo_target=docbase@docker:1489 '{....}'

which can be shortened to

$ ./wdmawk -v docbase@docker:1489 '{....}'

If the extended target docbase parameter is present, it must be the first one.
If the ‘@’ or ‘:’ characters are missing, it means the enhanced syntax is not used and the script will not attempt to modify dfc.properties; it will pass on all the remaining parameters to the matching official tools.
When @[host][:port] is present, the dfc.properties file will be edited to accommodate the new docbroker’s parameters; all the existing couples dfc.docbroker.host/dfc.docbroker.port will either be removed (if ––append is missing) or preserved (if ––append is present) and a new couple entry will be appended with the given values. Obviously, if one want to avoid the homonym trap, ––append should not be used in order to let the given docbroker be picked up as the sole entry in the property file.
When ––append and ––keep are present, we end up with a convenient way to add docbroker entries into the property file without manually editing it.
As the host is optional, it can be omitted and the one from the first dfc.docbroker.host[] entry will be used instead. Ditto for the port.
Normally, upon returning from the invocation of the original tools, the former dfc.properties file is restored to its original content. However, if ––keep is mentioned, the rollback will not be performed and the modified file will replace the original file. The latter will still be there though but renamed to $DOCUMENTUM_SHARED/config/dfc.properties_saved_YY-MM-DD_HH:MI:SS so it will still be possible to manually roll back. ––keep is mostly useful in conjunction with ––append so that new docbrokers get permanently added to the configuration file.
Finally, when ––verbose is specified, the changes to the dfc.properties file will be sent to stdout; a diff of both the original and the new configuration file will also be shown, along with the final command-line used to invoke the selected original tool. This helps troubleshooting possible command-line parsing issues because, as it can be seen from the code, no extra-effort has been put into this section.

The Code

The script below shows a possible implementation:

#!/bin/bash
# Installation:
# it should not be called directly but through one of the aliases below for the standard tools instead:
# ln -s dctm-wrapper wiapi
# ln -s dctm-wrapper widql
# ln -s dctm-wrapper wdmawk
# where the initial w stands for wrapper;
# and then:
#    ./widql ...
# $DOCUMENTUM_SHARED must obviously exist;
# Since there is no \$DOCUMENTUM_SHARED in eCS ≥ 16.4, set it to $DOCUMENTUM as follows:
#    export DOCUMENTUM_SHARED=$DOCUMENTUM
# See Usage() for details;

Usage() {
   cat - >>EoU
./widql docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep]
./wiapi docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep]
./wdmawk [-v] docbase[@[host][:port]] [dmawk -v parameters] [--verbose] [--append] [--keep]
E.g.:
   wiapi dmtest
or:
   widql dmtest@remote_host
or:
   widql dmtest@remote_host:1491 -Udmadmin -Pxxxx
or:
   wiapi dmtest@:1491 --append
or:
   wdmawk -v dmtest01@docker:5489 -f ./twdmawk.awk -v ...
or:
   wdmawk dmtest01@docker:2489 -f ./twdmawk.awk -v ...
or:
   wiapi dmtest@remote_host:1491 --append --keep
etc...
If --verbose is present, the changes applied to \$DOCUMENTUM[_SHARED]/config/dfc.properties are displayed.
If --append is present, a new entry is appended to the dfc.properties file, the value couple dfc.docbroker.host and dfc.docbroker.port, and the existing ones are not commented out so they are still usable;
If --append is not present, all the entries are removed prior to inserting the new one;
If --keep is present, the changed dfc.properties file is not reverted to the changed one, i.e. the changes are made permanent;
If a change of configuration has been requested, the original config file is first saved with a timestamp appended and restored on return from the standard tools, unless --keep is present in which case
the backup file is also kept so it is still possible to manually revert to the original configuration;
wdmawk invokes dmawk passing it the -v docbase=$docbase command-line parameter;
In addition, if -v docbase[@[host][:port]] is used, -v repo_target=docbase[@[host][:port]] is also passed to dmawk;
Instead of a in-line target definition, environment variables can also be used, e.g.:
   widql dmtest@$dmtestVM ...
where $dmtestVM resolves to e.g. docker:1489
or even:
   widql $test01c ...
where $test01c resolves to e.g. dmtest01@container01:1489
As the environment variable is resolved by the shell before it invokes the program, make sure it has a definition, e.g. source a configuration file;
EoU
   exit 0
}

if [[ $# -eq 0 ]]; then
   Usage
fi

# save command;
current_cmd="$0 $*"

# which original program shall possibly be called ?
dctm_program=$(basename $0); dctm_program=${dctm_program:1}
if [[ $dctm_program == "dmawk" ]]; then
   bFordmawk=1 
else
   bFordmawk=0 
fi

# look for the --verbose, --append or --keep options;
# remove them from the command-line if found so they are not passed to the standard Documentum's tools;
# the goal is to clean up the command-line from the enhancements options so it can be passed to the official tools;
bVerbose=0
bAppend=0
bKeep=0
posTarget=1
passTarget2awk=0
while true; do
   index=-1
   bChanged=0
   for i in "$@"; do
      (( index += 1 ))
      if [[ "$i" == "--verbose" ]]; then
         bVerbose=1
         bChanged=1
         break
      elif [[ "$i" == "--append" ]]; then
         bAppend=1
         bChanged=1
         break
      elif [[ "$i" == "--keep" ]]; then
         bKeep=1
         bChanged=1
         break
      elif [[ "$i" == "-v" && $bFordmawk -eq 1 && $index -eq 0 ]]; then
	 passTarget2awk=1
         bChanged=1
         break
      fi
   done
   if [[ $bChanged -eq 1 ]]; then
      set -- ${@:1:index} ${@:index+2:$#-index-1}
   else
      break
   fi
done

[[ bVerbose -eq 1 ]] && echo "current_cmd=[$current_cmd]"

target=$1
remote_info=$(echo $1 | gawk '{
   docbase = ""; hostname = ""; port = ""
   if (match($0, /@[^ \t:]*/)) {
      docbase = substr($0, 1, RSTART - 1)
      hostname = substr($0, RSTART + 1, RLENGTH - 1)
      rest = substr($0, RSTART + RLENGTH)
      if (1 == match(rest, /:[0-9]+/))
         port = substr(rest, 2, RLENGTH - 1)
   }
   else docbase = $0
}
END {
   printf("%s:%s:%s", docbase, hostname, port)
}')
docbase=$(echo $remote_info | cut -d: -f1)
hostname=$(echo $remote_info | cut -d: -f2)
port=$(echo $remote_info | cut -d: -f3)

# any modifications to the config file requested ?
if [[ ! -z $hostname || ! -z $port ]]; then
   # the dfc.properties file must be changed for the new target repository;
   dfc_config=$DOCUMENTUM_SHARED/config/dfc.properties
   if [[ ! -f $dfc_config ]]; then
      echo "$dfc_config not found"
      echo "check the \$DOCUMENTUM_SHARED environment variable"
      echo " in ≥ 16.4, set it to \$DOCUMENTUM"
      exit 1
   fi
   
   # save the current config file;
   backup_file=${dfc_config}_saved_$(date +"%Y-%m-%d_%H:%M:%S")
   cp $dfc_config ${backup_file}

   [[ $bVerbose -eq 1 ]] && echo "changing to $hostname:$port..."
   pid=$$; gawk -v hostname="$hostname" -v port="$port" -v bAppend=$bAppend -v bVerbose=$bVerbose -v bKeep=$bKeep -v pid=$$ 'BEGIN {
      bFirst_hostname = 0; first_hostname = ""
      bFirst_port     = 0 ;    first_port = ""
      max_index = -1
   }
   {
      if (match($0, /^dfc.docbroker.host\[[0-9]+\]=/)) {
         if (!hostname && !bFirst_hostname) {
            # save the first host name to be used if command-line hostname was omitted;
            bFirst_hostname = 1
            first_hostname = substr($0, RLENGTH +1)
         }
         match($0, /\[[0-9]+\]/); index_number = substr($0, RSTART + 1, RLENGTH - 2)
         if (bAppend) {
            # leave the entry;
            print $0
            if (index_number > max_index)
               max_index = index_number
         }
         else {
            # do not, which will remove the entry;
            if (bVerbose)
               print "# removed:", $0 > ("/tmp/tmp_" pid)
         }
      }
      else if (match($0, /^dfc.docbroker.port\[[0-9]+\]=/)) {
         if (!port && !bFirst_port) {
            # save the first port to be used if command-line port was omitted;
            bFirst_port = 1
            first_port = substr($0, RLENGTH +1)
         }
         if (bAppend)
            # leave the entry;
            print $0
         else {
            # do nothing, which will remove the entry;
            if (bVerbose)
               print "# removed:", $0 > ("/tmp/tmp_" pid)
         }
      }
      else print
   }
   END {
      if (!hostname)
         hostname = first_hostname
      if (!port)
         port = first_port
      if (bAppend)
         index_number = max_index + 1
      else
         index_number = 0
      print "dfc.docbroker.host[" index_number "]=" hostname
      print "dfc.docbroker.port[" index_number "]=" port
      if (bVerbose) {
         print "# added: dfc.docbroker.host[" index_number "]=" hostname > ("/tmp/tmp_" pid)
         print "# added: dfc.docbroker.port[" index_number "]=" port     > ("/tmp/tmp_" pid)
      }
      close("/tmp/tmp_" pid)
   }' $dfc_config > ${dfc_config}_new

   if [[ $bVerbose -eq 1 ]]; then
      echo "requested changes:"
      cat /tmp/tmp_$$
      rm /tmp/tmp_$$
      echo "diffs:"
      diff $dfc_config ${dfc_config}_new
   fi 

   mv ${dfc_config}_new $dfc_config
   shift

   if [[ $bFordmawk -eq 1 ]]; then
      docbase="-v docbase=$docbase"
      [[ $passTarget2awk -eq 1 ]] && docbase="-v repo_target=$target $docbase"
   fi
   [[ $bVerbose -eq 1 ]] && echo "calling original: $DM_HOME/bin/${dctm_program} $docbase $*"
   $DM_HOME/bin/${dctm_program} $docbase $*

   # restore original config file;
   [[ $bKeep -eq 0 ]] && mv ${backup_file} $dfc_config
else
   if [[ $bVerbose -eq 1 ]]; then
      echo "no change to current $dfc_config file"
      echo "calling original: $DM_HOME/bin/${dctm_program} $*"
   fi
   $DM_HOME/bin/${dctm_program} $*
fi

The original configuration file is always saved on entry by appending a timestamp precise to the second which, unless you’re the Flash running the command twice in the background with the option ––keep but without ––append, should be enough to preserve the original content.
To make the command-line parsing simpler, the script relies on the final invoked command for checking any syntax errors. Feel free to modify it and make it more robust if you need that. As said earlier, the ––verbose option can help troubleshooting unexpected results here.
See part II of this article for the tests.

Cet article Connecting to a Repository via a Dynamically Edited dfc.properties File (part I) est apparu en premier sur Blog dbi services.

An exotic feature in the content server: check_client_version

Sun, 2019-06-16 12:26
An exotic feature in the content server: check_client_version

A few months ago, I tripped over a very mysterious problem while attempting to connect to a 7.3 CS docbase from within dqMan.
We had 3 docbases and we could connect using this client to all of them but one ! Moreover, we could connect to all three using a remote Documentum Administrator or the local idql/iapi command-line tools. Since we could connect to at least one of them with dqMan, this utility was not guilty. Also, since all three docbases accepted connections, they were all OK in this respect. Ditto for the account used, dmadmin or nominative ones; local connections were possible hence the accounts were all active and, as they could be used from within the remote DA, their identification method and password were correct too.
We tried connecting from different workstations in order to check the dqMan side, we cleared its caches, we reinstalled it, but to no avail. We checked the content server’s log, as usual nothing relevant. It was just the combination of this particular docbase AND dqMan. How strange !
So what the heck was wrong here ?
As we weren’t the only administrators of those repositories, we more or less suspecting someone else change some setting but which one ? Ok, I sort of gave it away in the title but please bear with me and read on.
I don’t remember exactly how, we were probably working in panic mode, but we eventually decided to compare the docbases’ dm_docbase_config object side by side as shown below (with some obfuscation):

paste <(iapi bad_boy -Udmadmin -Pxxx <<eoq | awk '{print substr($0, 1, 80)}'
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq
) <(iapi good_boy -Udmadmin -Pxxx <<eoq | awk '{print substr($0, 1, 80)}'
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq
) | column -c 30 -s $'\t' -t | tail +11 | head -n 48
USER ATTRIBUTES                                          USER ATTRIBUTES
  object_name                     : bad_boy                object_name                     : good_boy
  title                           : bad_boy Repository     title                           : good_boy Global Repository
  subject                         :                        subject                         :
  authors                       []:                        authors                       []: 
  keywords                      []:                        keywords                      []: 
  resolution_label                :                        resolution_label                :
  owner_name                      : bad_boy                owner_name                      : good_boy
  owner_permit                    : 7                      owner_permit                    : 7
  group_name                      : docu                   group_name                      : docu
  group_permit                    : 5                      group_permit                    : 5
  world_permit                    : 3                      world_permit                    : 3
  log_entry                       :                        log_entry                       :
  acl_domain                      : bad_boy                acl_domain                      : good_boy
  acl_name                        : dm_450xxxxx80000100    acl_name                        : dm_450xxxxx580000100
  language_code                   :                        language_code                   :
  mac_access_protocol             : nt                     mac_access_protocol             : nt
  security_mode                   : acl                    security_mode                   : acl
  auth_protocol                   :                        auth_protocol                   :
  index_store                     : DM_bad_boy_INDEX       index_store                     : DM_good_boy_INDEX
  folder_security                 : T                      folder_security                 : T
  effective_date                  : nulldate               effective_date                  : nulldate
  richmedia_enabled               : T                      richmedia_enabled               : T
  dd_locales                   [0]: en                     dd_locales                   [0]: en
  default_app_permit              : 3                      default_app_permit              : 3
  oldest_client_version           :                        oldest_client_version           :
  max_auth_attempt                : 0                      max_auth_attempt                : 0
  client_pcaching_disabled        : F                      client_pcaching_disabled        : F
  client_pcaching_change          : 1                      client_pcaching_change          : 1
  fulltext_install_locs        [0]: dsearch                fulltext_install_locs        [0]: dsearch
  offline_sync_level              : 0                      offline_sync_level              : 0
  offline_checkin_flag            : 0                      offline_checkin_flag            : 0
  wf_package_control_enabled      : F                      wf_package_control_enabled      : F
  macl_security_disabled          : F                      macl_security_disabled          : F
  trust_by_default                : T                      trust_by_default                : T
  trusted_docbases              []:                        trusted_docbases              []: 
  login_ticket_cutoff             : nulldate               login_ticket_cutoff             : nulldate
  auth_failure_interval           : 0                      auth_failure_interval           : 0
  auth_deactivation_interval      : 0                      auth_deactivation_interval      : 0
  dir_user_sync_on_demand         : F                      dir_user_sync_on_demand         : F
  check_client_version            : T                      check_client_version            : F
  audit_old_values                : T                      audit_old_values                : T
  docbase_roles                 []:                        docbase_roles                [0]: Global Registry
  approved_clients_only           : F                      approved_clients_only           : F
  minimum_owner_permit            : 2                      minimum_owner_permit            : 0
  minimum_owner_xpermit           :                        minimum_owner_xpermit           :
  dormancy_status                 :                        dormancy_status                 :

The only significant differences were the highlighted ones and the most obvious one was the attribute check_client_version, it was turned on in the bad_boy repository. Now that we finally had something to blame, the universe started making sense again ! We quickly turned this setting to false and could eventually connect to that recalcitrant docbase. But the question is still open: check against what ? What criteria was applied to refuse dqman access to bad_boy but to allow it to good_boy ? That was still not clear, even though we could work around it.
Now, who and why turned it on, that had to remain a mystery.
While we were at it, we also noticed another attribute which seemed to be related to the previous one: oldest_client_version.
Was there any other client_% attribute in dm_docbase_config ?

paste <(iapi good_boy -Udmadmin -Pdmadmin <<eoq | grep client
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq) <(iapi bad_boy -Udmadmin -Pdmadmin <<eoq | grep client
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq) | column -s $'\t' -t
  oldest_client_version           :      oldest_client_version           : 
  client_pcaching_disabled        : F    client_pcaching_disabled        : F
  client_pcaching_change          : 1    client_pcaching_change          : 1
  check_client_version            : F    check_client_version            : T
  approved_clients_only           : F    approved_clients_only           : F

Yes, but they looked quite harmless in the current context.
Thus, the relevant attributes here are check_client_version and oldest_client_version. Let’s discover a bit more about them.

Digging

As usual, the documentation is a bit scketchy about these attributes:

check_client_version Boolean S T means that the repository
                               servers will not accept connections
                               from clients older than the
                               version level specified in the
                               oldest_client_version property.
                               F means that the servers accept
                               connections from any client version.
                               The default is F.

oldest_client _version string(32) S Version number of the oldest
                                    Documentum client that will access
                                    this repository.
                                    This must be set manually. It is used
                                    by the DFC to determine how to
                                    store chunked XML documents. If
                                    check_client_version is set to T,then
                                    this value is also used to identify the
                                    oldest client version level that may
                                    connect to the repository.

But what is the client version ? Logically, it is the version of its DfCs or, for older clients, the version of the dmcl shared library.
So, if check_client_version is true, the client version is checked and if it is older than the one defined in oldest_client_version, the client is forbidden to connect. That makes sense except that in our case, oldest_client_version was empty. Maybe in such a case, the client has to match exactly the content server’s DfC version ? As dqMan was either using the dmcl40.dll library or an old Dfc version, it was rejected. Let’s verify these hypothesis with a 16.4 target repository.
Connecting from an ancient 5.3 client
We exhumed an old 5.3 CS installation to use its client part with the default configuration in the target docbase:

dmadmin@osboxes:~/documentum53$ idql dmtest -Udmadmin -Pdmadmin
 
 
Documentum idql - Interactive document query interface
(c) Copyright Documentum, Inc., 1992 - 2004
All rights reserved.
Client Library Release 5.3.0.115 Linux
 
 
Connecting to Server using docbase dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080003913 started for user dmadmin."
 
 
Connected to Documentum Server running Release 16.4.0080.0129 Linux64.Oracle

Fine so far.
Let’s activate the dm_docbase_config.check_client_version in the target:

retrieve,c,dm_docbase_config
...
set,c,l,check_client_version
SET> T
...
OK
API> save,c,l
...
[DM_DCNFG_E_CANT_SAVE]error: "Cannot save dmtest docbase_config."
 
[DM_DCNFG_E_SET_OLDEST_CLIENT_VERSION_FIRST]error: "The docbase_config object attribute oldest_client_version has to be set before setting attribute check_client_version to T."

Interesting. At that time, this attribute was empty and yet the check_client_version was active. Is this constraint new in 16.4 or did the unknow administrator hack around this ? As I don’t have a 7.x repository available right now, I cannot test this point.
Let’s play by the rules and set oldest_client_version:

reset,c,l
set,c,l,oldest_client_version
16.4
save,c,l
OK
set,c,l,check_client_version
SET> T
...
OK
API> save,c,l
...
OK

Try connecting from the 5.3 client: still OK.
Maybe a reinit is necessary to actuate the changes:

reinit,c

Try again:

dmadmin@osboxes:~/documentum53$ idql dmtest -Udmadmin -Pdmadmin
&nbps;
&nbps;
Documentum idql - Interactive document query interface
(c) Copyright Documentum, Inc., 1992 - 2004
All rights reserved.
Client Library Release 5.3.0.115 Linux
 
 
Connecting to Server using docbase dmtest
Could not connect
[DM_SESSION_E_START_FAIL]error: "Server did not start session. Please see your system administrator or check the server log.
Error message from server was:
[DM_SESSION_E_AUTH_FAIL]error: "Authentication failed for user dmadmin with docbase dmtest."
 
"

So a reinit it required indeed.
Note the misleading error, it is not the authentication that is wrong but the client version validation. It is such wrong messages that make diagnosis of Documentum problems so hard and time-consuming. Anyway, let’s revert the check_client_version to F:

set,c,l,check_client_version
F
save,c,l
reinit,c

Try connecting: OK. So, the client version filtering is effective. Let’s try it with a 5.3 client version:

API> set,c,l,oldest_client_version
SET> 5.3
...
OK
API> save,c,l
...
OK
API> set,c,l,check_client_version
SET> T
...
OK
API> save,c,l
...
OK
API> reinit,c
...
OK

Try connecting: OK, that’s expected.
Let’s try it for a minimum 5.2 client version: it still works, which is expected too since the test client’s version is 5.3 and in my books 5.3 > 5.2.
Let’s try it for a miminum a 5.4 client version: the connection fails, so client version checking works as expected here.
Let’s try it for a miminum a 20.0 client version: the connection fails as expected. No check on the version’s value is done, which is quite understandable programmatically speaking, although a bit optimistic in the context of the turmoil Documentum went through lately.
Let’s go back to a more realistic value:

API> set,c,l,oldest_client_version
SET> 7.2
...
[DM_SESSION_E_AUTH_FAIL]error: "Authentication failed for user dmadmin with docbase dmtest."
 
 
API> save,c,l

Oops, interestingly, the last change did not make it because with the current setting so down the way into the future, the present client’s session was disconnected and there is no way to reconnect in order to revert it !
Let’s do the rollback from the database level directly:

sqlplus dmtest@orcl
 
SQL*Plus: Release 12.2.0.1.0 Production on Mon Jun 10 16:25:56 2019
 
Copyright (c) 1982, 2016, Oracle. All rights reserved.
 
Enter password:
Last Successful login time: Mon Jun 10 2019 16:25:40 +02:00
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
SQL> update dm_docbase_config_s set check_client_version = 0;
 
1 row updated.
SQL> commit;
 
Commit complete.
 
quit;

Try to connect:

iapi dmtest@docker:1489
Please enter a user (dmadmin):
Please enter password for dmadmin:
 
 
OpenText Documentum iapi - Interactive API interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest
[DM_SESSION_E_AUTH_FAIL]error: "Authentication failed for user dmadmin with docbase dmtest."

Still not ok because the reinit is missing but for this we need to connect which we still cannot because of the missing reinit. To break this catch-22 situation, let’s cut the gordian knot and kill the dmtest docbase’s processes:

dmadmin@docker:~$ ps ajxf | grep dmtest
1 27843 27843 27843 ? -1 Ss 1001 0:00 ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27849 27843 27843 ? -1 S 1001 0:00 \_ /app/dctm/product/16.4/bin/mthdsvr master 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 27843 dmtest /app/dctm/dba/log
27849 27850 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 0 dmtest /app/dctm/dba/log
27849 27861 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 1 dmtest /app/dctm/dba/log
27849 27874 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 2 dmtest /app/dctm/dba/log
27849 27886 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 3 dmtest /app/dctm/dba/log
27849 27899 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 4 dmtest /app/dctm/dba/log
27843 27862 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27863 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27875 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27887 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27901 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27944 27843 27843 ? -1 Sl 1001 0:06 \_ ./dm_agent_exec -docbase_name dmtest.dmtest -docbase_owner dmadmin -sleep_duration 0
27843 27962 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini

and:

kill -9 -27843

After restarting the docbase, the connectivity was restored.
So, be cautious while experimenting ! Needless to say, avoid doing it in a production docbase or in any heavily used development docbase for that matter, or the wrath of the multiverses and beyond will fall upon you and you will be miserable for ever.
Connecting from a 7.3 client
The same behavior and error messages as with the precedent 5.3 client were observed with a more recent 7.3 client and, inferring from the incident above, later clients behave the same way.

Conclusion

We never stop learning stuff with Documentum ! While this client version limiting feature looks quite exotic, it may make sense in order to avoid surprises or even corruptions when using newly implemented extensions or existing but changed areas of the content server. It is possible that new versions of the DfCs behave differently from older ones in dealing with the same functionalities and Documentum had no better choice but to cut the older versions off to prevent any conflict. As usual, the implementation looks a bit hasty with inapt error messages costing hours of investigation and the risk to cut oneself off a repository.

Cet article An exotic feature in the content server: check_client_version est apparu en premier sur Blog dbi services.

Discover target database 18c with EM12c

Thu, 2019-06-06 08:34

Working on Enterprise Manager 12.1.0.4 version at a client’s site, we would like to know if oracle database target in 18c version could be discovered and monitored, even if Enterprise Manager 12.1.0.4 does not support Oracle 18c database targets.

Installing the agent 12c on the target host did not cause any problem, the oracle database 18c discovery ran successfully, but the database was seen as down in the Enterprise Manager 12.1.0.4 console.

We tried several tricks without any positive results, but running the following command shows us that this was a connection problem:


oracle@em12c:/home/oracle/:> emctl getmetric agent DB18,oracle_database,Response
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.
All rights reserved.
Status,State,oraerr,Archiver,DatabaseStatus,ActiveState0,UNKNOWN,
Failed to connect: java.sql.SQLException: 
ORA-28040: No matching authentication protocol,UNKNOWN,UNKNOWN,UNKNOWN

With Oracle 18c, the default value for SQLNET.ALLOWED_LOGON_VERSION parameter is 12, it means that database clients using pre-12c jdbc thin drivers cannot authenticate to 18c database servers.

The workaround is to add in the database server sqlnet.ora the following lines:

SQLNET.ALLOWED_LOGON_VERSION_SERVER=11
SQLNET.ALLOWED_LOGON_VERSION_CLIENT=11

We restart the database and the agent, and the Oracle database 18c is displayed up and running in Enterprise Manager 12.1.0.4:

Some more tests showed us that everything is working fine: incident management, performance, top activity ..aso

Nevertheless do not forget that oracle database target version 18c are not supported with Enterprise Manager 12.1.0.4. I will recommand to use the last Enterprise Manager 13.3 last version in order to administer and monitor Oracle database 18c.

Cet article Discover target database 18c with EM12c est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (6): Attaching and detaching partitions

Thu, 2019-06-06 01:00

Having talked about partitioning strategies and partition pruning this time we will have a look on how you can attach and detach partitions to and from an existing partitioned table. If you missed the last posts about partitioning in PostgreSQL here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4) : Hash partitioning
  5. PostgreSQL partitioning (5): Partition pruning

From time to time it might be required that you attach a partition to an existing partitioned table. A reason for that might be that you partitioned by year and suddenly you get data for the next year because your data is based on imported data from an external source. Of course you could just create a new partition for that and then load the data but there is a more elegant way of doing that.

Our range partitioned table currently has partitions for the years 2012 to 2020:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2012 FOR VALUES FROM ('2012-01-01') TO ('2013-01-01'),
            traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_default DEFAULT

Lets assume we suddenly get data for the year 2021 and we want that data to be part of the range partitioned table. If we keep the partitioned table as it is, new data would go to the default partition and this is not what we want. What we can do is to create a regular table with exactly the same structure as the partitioned table. With PostgreSQL’s create table command you have the “like_option” which is very handy for this use case:

postgres=# create table traffic_violations_p_2021 ( like traffic_violations_p including all );
CREATE TABLE
postgres=# \d traffic_violations_p_2021
                     Table "public.traffic_violations_p_2021"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 

For now this table has no relational at all to the partitioned table, it just has the very same structure, that’s it. Lets populate the new table with some data for 2021:

insert into traffic_violations_p_2021 (date_of_stop)
       select * from generate_series ( date('01.01.2021')
                                     , date('12.31.2021')
                                     , interval '1 day');

Now that we have 365 rows in the new table we can attach it to the partitioned table (of course you could already attach it before loading the data as well):

alter table traffic_violations_p
      attach partition traffic_violations_p_2021
      for values from ('2021-01-01') to ('2022-01-01');

So you can prepare the whole new data without touching the partitioned table. Once you are happy with it a simple attach command is enough and the partitioned table contains the new partition holding the data:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2012 FOR VALUES FROM ('2012-01-01') TO ('2013-01-01'),
            traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_default DEFAULT

Having the data for 2021 it might not be required anymore to have the data for 2012. Without partitioning you would need to delete from the table for getting rid of those rows. With partitioning you can detach the partition from the partitioned table:

postgres=# alter table traffic_violations_p detach partition traffic_violations_p_2012;
ALTER TABLE
postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_default DEFAULT

Whatever you want to do with the data from now on: The old partition became a regular table and you can either drop, truncate or move it to low cost storage for archiving.
This was attaching and detaching of partitions. In the next post we’ll look at indexing and constraints when it comes to partitioned tables.

Cet article PostgreSQL partitioning (6): Attaching and detaching partitions est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (5): Partition pruning

Wed, 2019-06-05 01:00

This is the next post in the PostgreSQL partitioning series. If you missed the previous ones here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4): Hash partitioning

This time we will have a look at partition pruning. Never heard of that? Don’t worry, hopefully it will be clear at the end of this blog post.

When you ask Wikipedia for pruning you get this: “Pruning is a horticultural and silvicultural practice involving the selective removal of certain parts of a plant, such as branches, buds, or roots. Reasons to prune plants include deadwood removal, shaping (by controlling or redirecting growth), improving or sustaining health, reducing risk from falling branches, preparing nursery specimens for transplanting, and both harvesting and increasing the yield or quality of flowers and fruits.”

Although this is about plants it almost exactly describes partition pruning as well: “selective removal of certain parts of a plant”. In our case it is removal of partitions when it is known that the partition(s) can not contain data we are asking for.

Lets come back to our range partitioned table. We partitioned the table by the “time_of_stop” column:

postgres=# select * from pg_partition_tree('traffic_violations_p');
            relid             |     parentrelid      | isleaf | level 
------------------------------+----------------------+--------+-------
 traffic_violations_p         |                      | f      |     0
 traffic_violations_p_default | traffic_violations_p | t      |     1
 traffic_violations_p_2012    | traffic_violations_p | t      |     1
 traffic_violations_p_2013    | traffic_violations_p | t      |     1
 traffic_violations_p_2014    | traffic_violations_p | t      |     1
 traffic_violations_p_2015    | traffic_violations_p | t      |     1
 traffic_violations_p_2016    | traffic_violations_p | t      |     1
 traffic_violations_p_2017    | traffic_violations_p | t      |     1
 traffic_violations_p_2018    | traffic_violations_p | t      |     1
 traffic_violations_p_2019    | traffic_violations_p | t      |     1
 traffic_violations_p_2020    | traffic_violations_p | t      |     1

Each partition contains data from one year. If we ask for data from 2013 PostgreSQL should only read that partition and just ignore the others.

postgres=# explain select count(*) from traffic_violations_p where date_of_stop = date('02-FEB-2013');
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=11393.96..11393.97 rows=1 width=8)
   ->  Gather  (cost=11393.75..11393.96 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=10393.75..10393.76 rows=1 width=8)
               ->  Parallel Seq Scan on traffic_violations_p_2013  (cost=0.00..10393.29 rows=185 width=0)
                     Filter: (date_of_stop = '2013-02-02'::date)
(6 rows)

Indeed that is happening and only the traffic_violations_p_2013 is considered. All other partitions will just be ignored and that of course is a performance improvement. This is the simple case and it is partition pruning at planning time. Because we have a literal in the where clause PostgreSQL can already decide at planning time which partitions it needs to read and which can be skipped.

Consider this example:

select count(*) 
  from traffic_violations_p 
 where date_of_stop = select date('01-FEB-2013');

In this case PostgreSQL can not know what will be the actual values coming back from the sub-select at planning time. If you are on PostgreSQL 10 there is not much you can do as partition pruning at execution time made it into PostgreSQL 11. As I am on PostgreSQL 12 Beta 1 it should work:

postgres=# postgres=# explain         
select count(*) 
  from traffic_violations_p 
 where date_of_stop = ( select to_date('01.01.2014','DD.MM.YYYY'));
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=85047.50..85047.51 rows=1 width=8)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Gather  (cost=85047.28..85047.49 rows=2 width=8)
         Workers Planned: 2
         Params Evaluated: $0
         ->  Partial Aggregate  (cost=84047.28..84047.29 rows=1 width=8)
               ->  Parallel Append  (cost=0.00..84042.52 rows=1901 width=0)
                     ->  Parallel Seq Scan on traffic_violations_p_2015  (cost=0.00..12924.89 rows=269 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2014  (cost=0.00..12235.54 rows=255 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2016  (cost=0.00..12097.57 rows=252 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2018  (cost=0.00..12051.87 rows=249 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2017  (cost=0.00..10996.34 rows=228 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2013  (cost=0.00..10393.29 rows=218 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2012  (cost=0.00..8351.41 rows=172 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2019  (cost=0.00..4959.83 rows=246 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_default  (cost=0.00..11.54 rows=15 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2020  (cost=0.00..10.74 rows=1 width=0)
                           Filter: (date_of_stop = $0)
(28 rows)

The execution plan shows that all partitions will be scanned and that is no surprise. When you want to see partition pruning at execution time you actually have to execute the statement, so explain(analyze):

postgres=# explain(analyze)
select count(*) 
  from traffic_violations_p 
 where date_of_stop = ( select to_date('01.01.2014','DD.MM.YYYY'));
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=85047.50..85047.51 rows=1 width=8) (actual time=149.747..149.747 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
   ->  Gather  (cost=85047.28..85047.49 rows=2 width=8) (actual time=145.730..150.004 rows=3 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Partial Aggregate  (cost=84047.28..84047.29 rows=1 width=8) (actual time=119.148..119.149 rows=1 loops=3)
               ->  Parallel Append  (cost=0.00..84042.52 rows=1901 width=0) (actual time=119.052..119.127 rows=189 loops=3)
                     ->  Parallel Seq Scan on traffic_violations_p_2015  (cost=0.00..12924.89 rows=269 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2014  (cost=0.00..12235.54 rows=255 width=0) (actual time=119.024..119.077 rows=189 loops=3)
                           Filter: (date_of_stop = $0)
                           Rows Removed by Filter: 74405
                     ->  Parallel Seq Scan on traffic_violations_p_2016  (cost=0.00..12097.57 rows=252 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2018  (cost=0.00..12051.87 rows=249 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2017  (cost=0.00..10996.34 rows=228 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2013  (cost=0.00..10393.29 rows=218 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2012  (cost=0.00..8351.41 rows=172 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2019  (cost=0.00..4959.83 rows=246 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_default  (cost=0.00..11.54 rows=15 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2020  (cost=0.00..10.74 rows=1 width=0) (never executed)
                           Filter: (date_of_stop = $0)
 Planning Time: 0.604 ms
 Execution Time: 150.120 ms
(32 rows)

The keywords here are “(never executed)” and this is partition pruning at execution time. There are some limitations with this but this will be covered in the last post of this series when we will look at corner cases when it comes to partitioning in PostgreSQL.

The next post will cover attaching and detaching of partitions.

Cet article PostgreSQL partitioning (5): Partition pruning est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (4): Hash partitioning

Tue, 2019-06-04 01:00

The last partitioning strategy we will look at is: hash partitioning. If you missed the first posts in this series here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning

Usually hash partitioning is used when you do not have a natural way of partitioning your data or you want to evenly distribute the data based on hash.

In PostgreSQL hash partitioning might seem a bit strange in the beginning because you define a modulus and a remainder and that defines where the data is going to. Having a look at our materialized view which contains our data set which columns might be a good candidate for that?

postgres=# \d mv_traffic_violations 
                 Materialized view "public.mv_traffic_violations"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 

A good candidate would be the councils column. When we setup the hash partitioning correctly the result would be 5 partitions:

postgres=# select distinct councils from mv_traffic_violations ;
 councils 
----------
         
        1
        3
        5
        2
        4
(6 rows)

The partitioned table becomes:

create table traffic_violations_p_hash
( seqid text
, date_of_stop date  
, time_of_stop time
, agency text
, subagency text
, description text 
, location text
, latitude numeric
, longitude numeric
, accident text
, belts boolean
, personal_injury boolean 
, property_damage boolean
, fatal boolean
, commercial_license boolean
, hazmat boolean
, commercial_vehicle boolean
, alcohol boolean
, workzone boolean
, state text
, vehicletype text 
, year smallint
, make text
, model text
, color text
, violation_type text 
, charge text
, article text
, contributed_to_accident boolean
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation point
, council_districts smallint 
, councils smallint
, communities smallint
, zip_codes smallint
, municipalities smallint
)
partition by hash (councils);

As usual psql will report the partitioning strategy for the table as well:

postgres=# \d traffic_violations_p_hash 
               Partitioned table "public.traffic_violations_p_hash"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition key: HASH (councils)
Number of partitions: 0

The setup for the hash partitions will be as follows:

create table traffic_violations_p_hash_p1
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 0);

create table traffic_violations_p_hash_p2
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 1);

create table traffic_violations_p_hash_p3
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 2);

create table traffic_violations_p_hash_p4
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 3);

create table traffic_violations_p_hash_p5
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 4);

Hash partitioning can not have a default partition as that would not make any sense because of the modulus and the remainder. When you try to do that you will get an error:

postgres=# create table traffic_violations_p_hash_default
postgres-# partition of traffic_violations_p_hash default;
psql: ERROR:  a hash-partitioned table may not have a default partition

The final setup of our hash partitioned table is:

postgres=# \d+ traffic_violations_p_hash
                                   Partitioned table "public.traffic_violations_p_hash"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: HASH (councils)
Partitions: traffic_violations_p_hash_p1 FOR VALUES WITH (modulus 5, remainder 0),
            traffic_violations_p_hash_p2 FOR VALUES WITH (modulus 5, remainder 1),
            traffic_violations_p_hash_p3 FOR VALUES WITH (modulus 5, remainder 2),
            traffic_violations_p_hash_p4 FOR VALUES WITH (modulus 5, remainder 3),
            traffic_violations_p_hash_p5 FOR VALUES WITH (modulus 5, remainder 4)

Loading the data and checking the partitions:

postgres=# insert into traffic_violations_p_hash
postgres-#        select * from mv_traffic_violations;
INSERT 0 1528078
postgres=# select count(*) from traffic_violations_p_hash_p1;
 count  
--------
 988085
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p2;
 count  
--------
 539993
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p3;
 count 
-------
     0
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p4;
 count 
-------
     0
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p5;
 count 
-------
     0
(1 row)

Why do we have only data in the first and second partition? Looking more closely at the distribution of the councils in the materialized view:

postgres=# select distinct councils, count(*) from mv_traffic_violations group by councils;
 councils | count  
----------+--------
        1 | 231070
        2 | 211759
        3 | 328234
        4 | 334142
        5 | 308402
          | 114471
(6 rows)

First of all we have a lot of rows which do not have a council. Null values in hash partitioned tables always go to the partition where the remainder is 0:

postgres=# truncate table traffic_violations_p_hash;
TRUNCATE TABLE
postgres=# insert into traffic_violations_p_hash (councils) select null from generate_series(1,100);
INSERT 0 100
postgres=# select count(*) from traffic_violations_p_hash_p1;
 count 
-------
   100
(1 row)

Here is the reason for this behavior: “The calculated hash value for the null value will be zero, therefore, it will fall to the partition having remainder zero.”.

This does still not explain why nothing is in the partitions 3 to 5. Lets do a few tests with our councils:

postgres=# truncate traffic_violations_p_hash;
TRUNCATE TABLE
postgres=# insert into traffic_violations_p_hash (councils) values (1),(2),(3),(4),(5);
INSERT 0 5
postgres=# select councils from traffic_violations_p_hash_p1;
 councils 
----------
        1
        4
        5
(3 rows)

postgres=# select councils from traffic_violations_p_hash_p2;
 councils 
----------
        2
        3
(2 rows)

This confirms the behavior. Councils 1,4 and 5 go to the first partition while 2 and 3 go to the second one. The decision to hash partition the on the councils column clearly was not a good idea. Hash paritioning is great when you have many different values:

postgres=# truncate traffic_violations_p_hash;
TRUNCATE TABLE
postgres=# insert into traffic_violations_p_hash (councils) select * from generate_series(1,10000);
INSERT 0 10000
postgres=# select count(*) from traffic_violations_p_hash_p1;
 count 
-------
  1969
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p2;
 count 
-------
  2034
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p3;
 count 
-------
  2058
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p4;
 count 
-------
  1928
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p5;
 count 
-------
  2011
(1 row)

This gives much better distribution of the data. A good candidate would be the seqid column:

drop table traffic_violations_p_hash, traffic_violations_p_hash_p1, traffic_violations_p_hash_p2, traffic_violations_p_hash_p3, traffic_violations_p_hash_p4, traffic_violations_p_hash_p5;
create table traffic_violations_p_hash
( seqid text
, date_of_stop date  
, time_of_stop time
, agency text
, subagency text
, description text 
, location text
, latitude numeric
, longitude numeric
, accident text
, belts boolean
, personal_injury boolean 
, property_damage boolean
, fatal boolean
, commercial_license boolean
, hazmat boolean
, commercial_vehicle boolean
, alcohol boolean
, workzone boolean
, state text
, vehicletype text 
, year smallint
, make text
, model text
, color text
, violation_type text 
, charge text
, article text
, contributed_to_accident boolean
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation point
, council_districts smallint 
, councils smallint
, communities smallint
, zip_codes smallint
, municipalities smallint
)
partition by hash (seqid);

create table traffic_violations_p_hash_p1
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 0);

create table traffic_violations_p_hash_p2
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 1);

create table traffic_violations_p_hash_p3
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 2);

create table traffic_violations_p_hash_p4
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 3);

create table traffic_violations_p_hash_p5
partition of traffic_violations_p_hash
for values with (modulus 5, remainder 4);

Loading the data again and checking the partitions:

postgres=# insert into traffic_violations_p_hash
postgres-#        select * from mv_traffic_violations;
INSERT 0 1528078
postgres=# select count(*) from traffic_violations_p_hash_p1;
 count  
--------
 305253
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p2;
 count  
--------
 304999
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p3;
 count  
--------
 305215
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p4;
 count  
--------
 305719
(1 row)

postgres=# select count(*) from traffic_violations_p_hash_p5;
 count  
--------
 306892
(1 row)

That was hash partitioning. In the next post we’ll look at partition pruning.

Cet article PostgreSQL partitioning (4): Hash partitioning est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (3): List partitioning

Mon, 2019-06-03 09:12

In the last posts of this series we prepared the data set and had a look at range partitioning. In this post we look at another partitioning strategy: List partitioning. Instead of partitioning by a range (typically based on day, year, month) list partitioning is used to partition on an explicit list with key values that define the partitions.

Coming back to the materialized view which holds our initial data set:

postgres=# \d mv_traffic_violations 
                 Materialized view "public.mv_traffic_violations"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 

The application might want to ask for all traffic violations that match a specific violation type:

postgres=# select distinct violation_type from mv_traffic_violations;
 violation_type 
----------------
 Warning
 SERO
 Citation
 ESERO
(4 rows)

Partitioning by that would result in 4 partitions (plus one default partition for any other potential violation type in the future). The definition of the table and it’s partitions is pretty much the same as for range partitioning but this time we go for list partitioning:

create table traffic_violations_p_list
( seqid text
, date_of_stop date  
, time_of_stop time
, agency text
, subagency text
, description text 
, location text
, latitude numeric
, longitude numeric
, accident text
, belts boolean
, personal_injury boolean 
, property_damage boolean
, fatal boolean
, commercial_license boolean
, hazmat boolean
, commercial_vehicle boolean
, alcohol boolean
, workzone boolean
, state text
, vehicletype text 
, year smallint
, make text
, model text
, color text
, violation_type text 
, charge text
, article text
, contributed_to_accident boolean
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation point
, council_districts smallint 
, councils smallint
, communities smallint
, zip_codes smallint
, municipalities smallint
)
partition by list (violation_type);

psql will show the list partitioning strategy as well:

postgres=# \d traffic_violations_p_list
               Partitioned table "public.traffic_violations_p_list"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition key: LIST (violation_type)
Number of partitions: 0

Now we can create the partitions:

create table traffic_violations_p_list_warning
partition of traffic_violations_p_list
for values in ('Warning');

create table traffic_violations_p_list_sero
partition of traffic_violations_p_list
for values in ('SERO');

create table traffic_violations_p_list_Citation
partition of traffic_violations_p_list
for values in ('Citation');

create table traffic_violations_p_list_ESERO
partition of traffic_violations_p_list
for values in ('ESERO');

create table traffic_violations_p_list_default
    partition of traffic_violations_p_list DEFAULT;

Again, psql will display the partitions nicely:

postgres=# \d+ traffic_violations_p_list
                                   Partitioned table "public.traffic_violations_p_list"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: LIST (violation_type)
Partitions: traffic_violations_p_list_citation FOR VALUES IN ('Citation'),
            traffic_violations_p_list_esero FOR VALUES IN ('ESERO'),
            traffic_violations_p_list_sero FOR VALUES IN ('SERO'),
            traffic_violations_p_list_warning FOR VALUES IN ('Warning'),
            traffic_violations_p_list_default DEFAULT

Loading the data and checking the amount of data for each partition:

postgres=# insert into traffic_violations_p_list
postgres-#        select * from mv_traffic_violations;
INSERT 0 1528078
postgres=# select count(*) from traffic_violations_p_list_warning;
 count  
--------
 759747
(1 row)

postgres=# select count(*) from traffic_violations_p_list_sero;
 count 
-------
   899
(1 row)

postgres=# select count(*) from traffic_violations_p_list_Citation;
 count  
--------
 693226
(1 row)

postgres=# select count(*) from traffic_violations_p_list_ESERO;
 count 
-------
 74206
(1 row)

postgres=# select count(*) from traffic_violations_p_list_default;
 count 
-------
     0
(1 row)

Done. This is list partitioning. Instead of partitioning by a range, list partitioning is for partitioning by key values. In the next post we will have a look at hash partitioning.

Cet article PostgreSQL partitioning (3): List partitioning est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (2): Range partitioning

Mon, 2019-06-03 01:00

Now that the data set is ready we will look at the first partitioning strategy: Range partitioning. Usually range partitioning is used to partition a table by days, months or years although you can partition by other data types as well. Time/date based range partitioning is probably the most common use case so we will partition our traffic violation data by year and later on sub partition that month.

Going back to the materialized we created in the first post, this is the structure:

postgres=# \d mv_traffic_violations 
                 Materialized view "public.mv_traffic_violations"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 

Lets assume our most common queries against the data set restrict the data for one or more years. Using partitioned tables we can range partition on the “date_of_stop” column and break down the large data set into smaller chunks per year. How would that work in PostgreSQL?

Currently we have this min and max values for the “date_of_stop” column:

postgres=# select min(date_of_stop), max(date_of_stop) from mv_traffic_violations;
    min     |    max     
------------+------------
 2012-01-01 | 2019-06-01
(1 row)

If we want to partition that by year we need at least 8 partitions (2012 to 2019) and maybe already add a partition for 2010 so we are on the safe side when a more recent data set brings data for next year. The first step is to create a partitioned table:

create table traffic_violations_p
( seqid text
, date_of_stop date not null
, time_of_stop time
, agency text
, subagency text
, description text 
, location text
, latitude numeric
, longitude numeric
, accident text
, belts boolean
, personal_injury boolean 
, property_damage boolean
, fatal boolean
, commercial_license boolean
, hazmat boolean
, commercial_vehicle boolean
, alcohol boolean
, workzone boolean
, state text
, vehicletype text 
, year smallint
, make text
, model text
, color text
, violation_type text 
, charge text
, article text
, contributed_to_accident boolean
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation point
, council_districts smallint 
, councils smallint
, communities smallint
, zip_codes smallint
, municipalities smallint)
partition by range (date_of_stop);

Now we have a partitioned table with no partitions:

postgres=# \d traffic_violations_p
                  Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           | not null | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition key: RANGE (date_of_stop)
Number of partitions: 0

If we try to insert data without any partitions this will obviously fail:

postgres=# insert into traffic_violations_p (date_of_Stop) values ( now() );
psql: ERROR:  no partition of relation "traffic_violations_p" found for row
DETAIL:  Partition key of the failing row contains (date_of_stop) = (2019-05-28).

We need partitions and this is how you can add partitions to that partitioned table:

create table traffic_violations_p_2012
partition of traffic_violations_p
for values from ('2012-01-01') to ('2012-12-31');

create table traffic_violations_p_2013
partition of traffic_violations_p
for values from ('2013-01-01') to ('2013-12-31');

create table traffic_violations_p_2014
partition of traffic_violations_p
for values from ('2014-01-01') to ('2014-12-31');

create table traffic_violations_p_2015
partition of traffic_violations_p
for values from ('2015-01-01') to ('2015-12-31');

create table traffic_violations_p_2016
partition of traffic_violations_p
for values from ('2016-01-01') to ('2016-12-31');

create table traffic_violations_p_2017
partition of traffic_violations_p
for values from ('2017-01-01') to ('2017-12-31');

create table traffic_violations_p_2018
partition of traffic_violations_p
for values from ('2018-01-01') to ('2018-12-31');

create table traffic_violations_p_2019
partition of traffic_violations_p
for values from ('2019-01-01') to ('2019-12-31');

create table traffic_violations_p_2020
partition of traffic_violations_p
for values from ('2020-01-01') to ('2020-12-31');

We now have a partition for each year from 2012 to 2020:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2012 FOR VALUES FROM ('2012-01-01') TO ('2012-12-31'),
            traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2013-12-31'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2014-12-31'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2015-12-31'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2016-12-31'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2017-12-31'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2018-12-31'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2019-12-31'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2020-12-31')

Doing the same insert as before now succeeds because we have a partition where the row can go to:

postgres=# insert into traffic_violations_p (date_of_Stop) values ( now() );
INSERT 0 1
postgres=# delete from traffic_violations_p;
DELETE 1
postgres=# 

What still would fail is an insert with a date before 2012 or after 2020:

postgres=# insert into traffic_violations_p (date_of_Stop) values ( now() + interval '2 years' );
psql: ERROR:  no partition of relation "traffic_violations_p" found for row
DETAIL:  Partition key of the failing row contains (date_of_stop) = (2021-05-28).

To avoid that we can create a “default” partition:

CREATE TABLE traffic_violations_p_default
    PARTITION OF traffic_violations_p DEFAULT;

A default partition will hold all the rows that do not match any of the existing partition definitions:

postgres=# select (date_of_stop) from traffic_violations_p_default;
 date_of_stop 
--------------
 2021-05-28
(1 row)
postgres=# delete from traffic_violations_p;
DELETE 1

As our partitioned table setup is now complete we can load the data:

postgres=# insert into traffic_violations_p select * from mv_traffic_violations;
INSERT 0 1528078

All rows successfully loaded so we can check the counts for each partition:

postgres=# select count(*) from traffic_violations_p_2012;
 count  
--------
 150989
(1 row)

postgres=# select count(*) from traffic_violations_p_2013;
 count  
--------
 189946
(1 row)

postgres=# select count(*) from traffic_violations_p_2014;
 count  
--------
 223248
(1 row)

postgres=# select count(*) from traffic_violations_p_2015;
 count  
--------
 234468
(1 row)

postgres=# select count(*) from traffic_violations_p_2016;
 count  
--------
 219227
(1 row)

postgres=# select count(*) from traffic_violations_p_2017;
 count  
--------
 198879
(1 row)

postgres=# select count(*) from traffic_violations_p_2018;
 count  
--------
 217911
(1 row)

postgres=# select count(*) from traffic_violations_p_2019;
 count 
-------
 89823
(1 row)

postgres=# select count(*) from traffic_violations_p_2020;
 count 
-------
     0
(1 row)

postgres=# select count(*) from traffic_violations_p_default;
 count 
-------
  3588
(1 row)

Why do we have so many rows in the default partition? We checked the min and max values and there should be no row before 2012 and after 2019. But:

postgres=# select distinct date_of_stop from traffic_violations_p_default order by 1;
 date_of_stop 
--------------
 2012-12-31
 2013-12-31
 2014-12-31
 2015-12-31
 2016-12-31
 2017-12-31
 2018-12-31
(7 rows)

The issue it, that the upper bound is exclusive so we did not correctly setup the partitions. The correct way of doing it is:

drop table traffic_violations_p_2012, traffic_violations_p_2013, traffic_violations_p_2014, traffic_violations_p_2015, traffic_violations_p_2016, traffic_violations_p_2017, traffic_violations_p_2018, traffic_violations_p_2019, traffic_violations_p_2020;

create table traffic_violations_p_2012
partition of traffic_violations_p
for values from ('2012-01-01') to ('2013-01-01');

create table traffic_violations_p_2013
partition of traffic_violations_p
for values from ('2013-01-01') to ('2014-01-01');

create table traffic_violations_p_2014
partition of traffic_violations_p
for values from ('2014-01-01') to ('2015-01-01');

create table traffic_violations_p_2015
partition of traffic_violations_p
for values from ('2015-01-01') to ('2016-01-01');

create table traffic_violations_p_2016
partition of traffic_violations_p
for values from ('2016-01-01') to ('2017-01-01');

create table traffic_violations_p_2017
partition of traffic_violations_p
for values from ('2017-01-01') to ('2018-01-01');

create table traffic_violations_p_2018
partition of traffic_violations_p
for values from ('2018-01-01') to ('2019-01-01');

create table traffic_violations_p_2019
partition of traffic_violations_p
for values from ('2019-01-01') to ('2020-01-01');

create table traffic_violations_p_2020
partition of traffic_violations_p
for values from ('2020-01-01') to ('2021-01-01');

delete from traffic_violations_p;

Once the partitions are set up as required the data is correctly stored:

postgres=# insert into traffic_violations_p select * from mv_traffic_violations;
INSERT 0 1528078
postgres=# select count(*) from traffic_violations_p_default ;
 count 
-------
     0
(1 row)

Instead of having one big table containing all the data we now have 10 partitions splitting the table in smaller chunks. Why does that matter? This is the topic of another post after we had a look at list and hash partitioning in the next posts.

Cet article PostgreSQL partitioning (2): Range partitioning est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (1): Preparing the data set

Sun, 2019-06-02 08:20

This is the start of a series about partitioning in PostgreSQL. For this and the following posts I will use PostgreSQL 12 (which currently is in beta) so some stuff might not work if you are on PostgreSQL 11 or even on PostgreSQL 10 when declarative partitioning was introduced. This very first post is only about preparing the data set and I decided to separate this one from the next post because it shows various things around loading data in PostgreSQL and therefore might be useful on its one.

When you are looking for free data sets “The home of the U.S. Government’s open data” is a great source. It provides free data sets for research, development or just data you can play with for whatever reason. For the scope of this and the following posts we will use the Traffic violations data set. It contains more the 1.5 millions of rows and is a good starting point for diving into PostgreSQL partitioning.

The initial import goes into a table which only contains text columns and from there we can load our partitions later on. The structure of the table is:

create table traffic_violations
( seqid text
, date_of_stop text
, time_of_stop text
, agency text
, subagency text
, description text
, location text
, latitude text
, longitude text 
, accident text
, belts text
, personal_injury text
, property_damage text
, fatal text
, commercial_license text
, hazmat text
, commercial_vehicle text
, alcohol text
, workzone text
, state text
, vehicletype text
, year text
, make text
, model text
, color text
, violation_type text
, charge text
, article text
, contributed_to_accident text
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation text
, council_districts text
, councils text
, communities text
, zip_codes text
, municipalities text
);

For loading the table the easiest way is to use PostgreSQL’s copy command:

postgres=# \! ls -l Traffic_Violations.csv
-rw-rw-r--. 1 postgres postgres 634161687 May 28 11:28 Traffic_Violations.csv
postgres=# copy traffic_violations from '/home/postgres/Traffic_Violations.csv' with ( format csv, header true, null 'null', delimiter ',');
COPY 1528078

So we have around 1.5 millions of rows with real data. As this table contains only text columns we will create a materialized view with data types that fit the data. Looking at one row of the table:

postgres=# \x
Expanded display is on.
postgres=# select * from traffic_violations limit 1;
-[ RECORD 1 ]-----------+-------------------------------------------------------------------------
seqid                   | e13d2082-55a7-4a93-8836-173be19d2648
date_of_stop            | 06/01/2019
time_of_stop            | 13:38:00
agency                  | MCP
subagency               | 2nd District, Bethesda
description             | DRIVER FAILURE TO YIELD RIGHT-OF-WAY TO VEH. UPON EMERGING FROM DRIVEWAY
location                | RANDOLPH RD / NEW HAMPSHIRE
latitude                | 39.07592
longitude               | -77.0011316666667
accident                | Yes
belts                   | No
personal_injury         | No
property_damage         | Yes
fatal                   | No
commercial_license      | No
hazmat                  | No
commercial_vehicle      | No
alcohol                 | No
workzone                | No
state                   | MD
vehicletype             | 02 - Automobile
year                    | 2014
make                    | HYUN
model                   | TK
color                   | SILVER
violation_type          | Warning
charge                  | 21-705(c)
article                 | Transportation Article
contributed_to_accident | false
race                    | WHITE
gender                  | F
driver_city             | SILVER SPRING
driver_state            | MD
dl_state                | MD
arrest_type             | A - Marked Patrol
geolocation             | (39.07592, -77.0011316666667)
council_districts       | 5
councils                | 5
communities             | 26
zip_codes               | 12
municipalities          | 1

The materialized view becomes:

create materialized view mv_traffic_violations
( seqid 
, date_of_stop  
, time_of_stop 
, agency  
, subagency  
, description 
, location 
, latitude 
, longitude 
, accident 
, belts 
, personal_injury 
, property_damage 
, fatal 
, commercial_license 
, hazmat 
, commercial_vehicle 
, alcohol 
, workzone 
, state 
, vehicletype 
, year 
, make 
, model 
, color 
, violation_type 
, charge 
, article 
, contributed_to_accident 
, race 
, gender 
, driver_city 
, driver_state 
, dl_state 
, arrest_type 
, geolocation 
, council_districts 
, councils 
, communities 
, zip_codes 
, municipalities 
)
as
select seqid
     , to_date(date_of_stop,'MM/DD/YYYY')
     , time_of_stop::time
     , agency
     , subagency 
     , description
     , location
     , latitude::numeric
     , longitude::numeric
     , accident
     , belts::boolean
     , personal_injury::boolean
     , property_damage::boolean
     , fatal::boolean
     , commercial_license::boolean
     , hazmat::boolean
     , commercial_vehicle::boolean
     , alcohol::boolean
     , workzone::boolean
     , state
     , vehicletype
     , case year
         when '' then null
         else year::smallint
       end
     , make
     , model
     , color
     , violation_type
     , charge
     , article
     , contributed_to_accident::boolean
     , race
     , gender
     , driver_city
     , driver_state
     , dl_state
     , arrest_type
     , geolocation::point
     , case council_districts
         when '' then null
         else council_districts::smallint
       end
     , case councils
         when '' then null
         else councils::smallint
       end
     , case communities
         when '' then null
         else communities::smallint
       end
     , case zip_codes
         when '' then null
         else zip_codes::smallint
       end
     , case municipalities
         when '' then null
         else municipalities::smallint
       end
  from traffic_violations;

The statement contains some “case” expressions to test for empty strings. If you do not do this you get issues like this when you cast into a specific data type:

psql: ERROR:  invalid input syntax for type smallint: ""

The beauty of a materialized view is, that you can refresh whenever the underlying data set changed, e.g.:

postgres=# refresh materialized view mv_traffic_violations WITH data;
REFRESH MATERIALIZED VIEW

Note that this will block access to the materialized view for the time of the refresh. You could avoid that with refreshing concurrently but that only works when you have a unique index on the materialized view. With this data set we can not create a unique index because some of the rows are duplicates, e.g.:

postgres=# select * from mv_traffic_violations where seqid='b87c908c-ce2d-4c10-89fa-ca48735af485' and date_of_stop = '2012-11-07' and time_of_stop = '05:07:00' and description = 'FAILURE OF VEH. DRIVER IN ACCIDENT TO LOCATE AND NOTIFY OWNER OF UNATTENDED VEH. OF DAMAGE' and charge = '20-105(b)' and violation_type = 'Citation';
-[ RECORD 1 ]-----------+-------------------------------------------------------------------------------------------
seqid                   | b87c908c-ce2d-4c10-89fa-ca48735af485
date_of_stop            | 2012-11-07
time_of_stop            | 05:07:00
agency                  | MCP
subagency               | 2nd District, Bethesda
description             | FAILURE OF VEH. DRIVER IN ACCIDENT TO LOCATE AND NOTIFY OWNER OF UNATTENDED VEH. OF DAMAGE
location                | IFO 2531 ROSS RD
latitude                | 39.0222578333333
longitude               | -77.04575825
accident                | Yes
belts                   | f
personal_injury         | f
property_damage         | t
fatal                   | f
commercial_license      | f
hazmat                  | f
commercial_vehicle      | f
alcohol                 | f
workzone                | f
state                   | MD
vehicletype             | 02 - Automobile
year                    | 2002
make                    | SUBARU
model                   | FORESTER
color                   | SILVER
violation_type          | Citation
charge                  | 20-105(b)
article                 | Transportation Article
contributed_to_accident | f
race                    | HISPANIC
gender                  | M
driver_city             | SILVER SPRING
driver_state            | MD
dl_state                | MD
arrest_type             | A - Marked Patrol
geolocation             | (39.022257833333,-77.04575825)
council_districts       | 5
councils                | 5
communities             | 10
zip_codes               | 17
municipalities          | 1
-[ RECORD 2 ]-----------+-------------------------------------------------------------------------------------------
seqid                   | b87c908c-ce2d-4c10-89fa-ca48735af485
date_of_stop            | 2012-11-07
time_of_stop            | 05:07:00
agency                  | MCP
subagency               | 2nd District, Bethesda
description             | FAILURE OF VEH. DRIVER IN ACCIDENT TO LOCATE AND NOTIFY OWNER OF UNATTENDED VEH. OF DAMAGE
location                | IFO 2531 ROSS RD
latitude                | 39.0222578333333
longitude               | -77.04575825
accident                | Yes
belts                   | f
personal_injury         | f
property_damage         | t
fatal                   | f
commercial_license      | f
hazmat                  | f
commercial_vehicle      | f
alcohol                 | f
workzone                | f
state                   | MD
vehicletype             | 02 - Automobile
year                    | 2002
make                    | SUBARU
model                   | FORESTER
color                   | SILVER
violation_type          | Citation
charge                  | 20-105(b)
article                 | Transportation Article
contributed_to_accident | f
race                    | HISPANIC
gender                  | M
driver_city             | SILVER SPRING
driver_state            | MD
dl_state                | MD
arrest_type             | A - Marked Patrol
geolocation             | (39.022257833333,-77.04575825)
council_districts       | 5
councils                | 5
communities             | 10
zip_codes               | 17
municipalities          | 1

So now we have the data set we can use for the partitioning examples in the next posts. To summarize, what did we do:

  • https://www.data.gov/ is a great source for free data sets and we used one of them to load data into PostgreSQL
  • copy is great for loading data into PostgreSQL
  • The “case” statement is great for testing various conditions in a statement
  • materialized view are great when you want to refresh from an underlying data set that changes, but you will need a unique index for refreshing concurrently
  • You might need to cast one data type into another

In the next post we will look at the first partitioning strategy: Range partitioning.

Cet article PostgreSQL partitioning (1): Preparing the data set est apparu en premier sur Blog dbi services.

PostgreSQL 12: New partition reporting functions

Sun, 2019-06-02 01:00

PostgreSQL 10 introduced declarative partitioning (with some limitations), PostgreSQL 11 improved that a lot (Updating the partition key now works in PostgreSQL 11, Insert…on conflict with partitions finally works in PostgreSQL 11, Local partitioned indexes in PostgreSQL 11, Hash Partitioning in PostgreSQL 11) and PostgreSQL 12 goes even further. Beside that foreign keys can now reference partitioned tables three new functions made it into PostgreSQL 12 that will give you information about your partitioned tables.

Our little demo setup consist of a list partitioned table with three partitions:

postgres=# create table animals ( id int generated always as identity ( cache 10 ),
postgres(#                        name text unique,
postgres(#                        primary key(id,name)
postgres(#                      ) 
postgres-#                      partition by list (name);
CREATE TABLE
postgres=# create table animals_elephants
postgres-#   partition of animals
postgres-#   for values in ('elephant');
CREATE TABLE
postgres=# create table animals_cats
postgres-#   partition of animals
postgres-#   for values in ('cats');
CREATE TABLE
postgres=# create table animals_dogs
postgres-#   partition of animals
postgres-#   for values in ('dogs');
CREATE TABLE

What already worked in PostgreSQL 11 is that psql will give you information about your partitions:

postgres=# \d animals
                   Partitioned table "public.animals"
 Column |  Type   | Collation | Nullable |           Default            
--------+---------+-----------+----------+------------------------------
 id     | integer |           | not null | generated always as identity
 name   | text    |           | not null | 
Partition key: LIST (name)
Indexes:
    "animals_pkey" PRIMARY KEY, btree (id, name)
    "animals_name_key" UNIQUE CONSTRAINT, btree (name)
Number of partitions: 3 (Use \d+ to list them.)

Using “\d+” will even show you more information:

postgres=# \d+ animals
                                       Partitioned table "public.animals"
 Column |  Type   | Collation | Nullable |           Default            | Storage  | Stats target | Description 
--------+---------+-----------+----------+------------------------------+----------+--------------+-------------
 id     | integer |           | not null | generated always as identity | plain    |              | 
 name   | text    |           | not null |                              | extended |              | 
Partition key: LIST (name)
Indexes:
    "animals_pkey" PRIMARY KEY, btree (id, name)
    "animals_name_key" UNIQUE CONSTRAINT, btree (name)
    "animals_i1" btree (name)
Partitions: animals_cats FOR VALUES IN ('cats'),
            animals_dogs FOR VALUES IN ('dogs'),
            animals_elephants FOR VALUES IN ('elephant')

Now with PostgreSQL 12 there are three new functions which you can use get information about your partitioned tables and the partitions itself. The first one will give you the partition tree:

postgres=# select pg_partition_tree('animals');
        pg_partition_tree        
---------------------------------
 (animals,,f,0)
 (animals_elephants,animals,t,1)
 (animals_dogs,animals,t,1)
 (animals_cats,animals,t,1)
(4 rows)

The second one will give you the ancestor relations of the given partition:

postgres=# select pg_partition_ancestors('animals_dogs');
 pg_partition_ancestors 
------------------------
 animals_dogs
 animals
(2 rows)

The third one will give you the root for a given partition:

postgres=# select pg_partition_root('animals_cats');
 pg_partition_root 
-------------------
 animals
(1 row)

All of them do also work for partitioned indexes:

postgres=# create index animals_i1 on animals (name);
CREATE INDEX
postgres=# select pg_partition_tree('animals_i1');
              pg_partition_tree              
---------------------------------------------
 (animals_i1,,f,0)
 (animals_cats_name_idx,animals_i1,t,1)
 (animals_dogs_name_idx,animals_i1,t,1)
 (animals_elephants_name_idx,animals_i1,t,1)
(4 rows)

Nice.

Cet article PostgreSQL 12: New partition reporting functions est apparu en premier sur Blog dbi services.

WebLogic – Upgrade of Java

Sat, 2019-06-01 03:05

If you are used to Java, you probably know that there are really important security fixes published every quarter in the Java Critical Patch Update (CPU) and it’s really important to keep it up to date. It’s always easy to upgrade your Java installation, you just have to deploy the new version entirely. In most cases, it’s also pretty easy to update the Components that are using Java (E.g.: Application Servers like Apache Tomcat), since they are – most of the time – relying on an environment variable (JAVA_HOME) to know which Java should be used. On the other hand, there is the WebLogic case and that’s where all the “fun” is…

In a previous blog, I talked about the JAVA_HOME management inside the WebLogic Server files. If you want to upgrade the Java version that is used by your WebLogic Server, then you basically have three choices so let’s see that in detail:

 

I. Use a symlink for the JAVA_HOME

You might be tempted to do something like that:

[weblogic@weblogic_server_01 ~]$ cd /app/Java/
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ ls -l
total 0
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk1.8.0_151
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ ln -s jdk1.8.0_151/ jdk
[weblogic@weblogic_server_01 Java]$ ls -l
total 0
lrwxrwxrwx. 1 weblogic weblogic 13 May 18 14:05 jdk -> jdk1.8.0_151/
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk1.8.0_151
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ JAVA_HOME=/app/Java/jdk
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ $JAVA_HOME/bin/java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ rm jdk
[weblogic@weblogic_server_01 Java]$ ln -s jdk1.8.0_192/ jdk
[weblogic@weblogic_server_01 Java]$ ls -l
total 0
lrwxrwxrwx. 1 weblogic weblogic 13 May 18 14:06 jdk -> jdk1.8.0_192/
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk1.8.0_151
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ $JAVA_HOME/bin/java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)
[weblogic@weblogic_server_01 Java]$

 

And then you would use “/app/Java/jdk” (a symlink) for WebLogic. Please don’t, it’s a terrible idea!

You might think that using a symlink is a good idea and I thought that too at the beginning… But Oracle officially doesn’t support it and worse than that, it will actually not stay. You can technically use a symlink for your WebLogic Server, it will run (even if it’s not officially supported) but as soon as you are going to run an installer/configurator, it will actually overwrite and replace the symlink path (“/app/Java/jdk“) in WebLogic files with the path of the target folder (“/app/Java/jdk1.8.0_192“).

It’s unfortunate, really, but WebLogic will run as planned and it will use “/app/Java/jdk/bin/java” at the beginning and as soon as you upgrade WebLogic or when you will run an installer, you will find out that WebLogic switched to “/app/Java/jdk1.8.0_192/bin/java” on its own and it overwritten the configuration that you put in place.

If you are using WLS 12.2.1.3 and you therefore have access to the setProperty.sh script (see section II.a below), it will also not accept a symlink, it will just set the JAVA_HOME value to the target folder behind the symlink:

[weblogic@weblogic_server_01 bin]$ ./setProperty.sh -name JAVA_HOME -value /app/Java/jdk
Property JAVA_HOME successfully set to "/app/Java/jdk1.8.0_192"
[weblogic@weblogic_server_01 bin]$

 

So, in summary, don’t use symlink, it will be pretty for a few minutes/hours/days but in the end, it won’t work anymore and you will therefore loose the purpose of using that.

 

II. Manually replace file content with the new JAVA_HOME

Here you have two choices again:

a. Minimal update

As mentioned in my previous blog, there are some specific configuration files that are used when starting components. If you want to change the path of your JAVA_HOME without too much work, a quick (& a little bit dirty) way is to update only the necessary files:

  • ${DOMAIN_HOME}/bin/setDomainEnv.sh
  • ${DOMAIN_HOME}/nodemanager/nodemanager.properties
  • (WLS 12.1.x) ${MW_HOME}/oracle_common/common/bin/commEnv.sh
  • (WLS 12.2.x) ${DOMAIN_HOME}/bin/setNMJavaHome.sh
  • (WLS 12.2.1.2 and below) ${MW_HOME}/oracle_common/common/bin/commBaseEnv.sh
  • (WLS 12.2.1.3 and above) ${MW_HOME}/oui/.globalEnv.properties

So, as you can see, if you want to update the JAVA_HOME in a very simple WebLogic Server, you need to change three or four files which will change depending on the version of WebLogic that you are using…

If you are using WLS 12.2.1.3, you don’t have to update the file “${MW_HOME}/oui/.globalEnv.properties” directly, there is actually an utility provided since 12.2.1.3 that will do it for you (more information here):

[weblogic@weblogic_server_01 ~]$ cd ${MW_HOME}/oui/bin/
[weblogic@weblogic_server_01 bin]$
[weblogic@weblogic_server_01 bin]$ cat ../.globalEnv.properties
#This file is automatically generated
#Sat May 18 14:34:24 UTC 2019
JAVA_HOME=/app/Java/jdk1.8.0_151
JAVA_HOME_1_8=/app/Java/jdk1.8.0_151
JVM_64=
[weblogic@weblogic_server_01 bin]$
[weblogic@weblogic_server_01 bin]$ ./getProperty.sh JAVA_HOME
/app/Java/jdk1.8.0_151
[weblogic@weblogic_server_01 bin]$
[weblogic@weblogic_server_01 bin]$ ./setProperty.sh -name JAVA_HOME -value /app/Java/jdk1.8.0_192
Property JAVA_HOME successfully set to "/app/Java/jdk1.8.0_192"
[weblogic@weblogic_server_01 bin]$
[weblogic@weblogic_server_01 bin]$ cat ../.globalEnv.properties
#This file is automatically generated
#Sat May 18 15:02:54 UTC 2019
JAVA_HOME=/app/Java/jdk1.8.0_192
JAVA_HOME_1_8=/app/Java/jdk1.8.0_151
JVM_64=
[weblogic@weblogic_server_01 bin]$

 

This solution is working but it’s not very clean. You will remain with references to the old path in a lot of files so at some point, you (or someone else) might wonder which Java is actually used. That’s why I personally don’t really like this solution.

b. Cleaner update

Instead of updating only the minimal files, what about being more thorough and update all the files with the mention of the Java path from the old one to the new one? This requires some (not so complicated) commands to find all relevant files, excluding logs, excluding previous patch files, excluding binaries and a few other things and then you can just replace the old Java with the new one.

This is a better solution but it has a few drawbacks:

  • You absolutely need to know what you are doing and you need to test it thoroughly. Don’t try sed commands (or whatever you prefer) to update files directly if you aren’t sure about what you are doing in the first place
  • It takes some time to list all files and update them properly, especially if there is a lot of files under $DOMAIN_HOME or $MW_HOME/$ORACLE_HOME. Even if you automated it, since your Operating System still needs to go through all the existing files and since this needs to be done while WebLogic isn’t running, it will increase the downtime needed
  • This would need to be done for each and every Java patch/update

Despite these drawbacks, I still prefer this solution because it’s much cleaner. No end-user would care if there are still references to an old Java in the WebLogic files, as long as the Application is working for them but I would know that it’s there and it would drive me crazy (a little bit :D).

Below are some example commands that can be used to do that. Please test them and update them as per your need! If you are using different components or additional ones, you might want to take that into account.

[weblogic@weblogic_server_01 ~]$ OLD_JAVA="/app/Java/jdk1.8.0_151"
[weblogic@weblogic_server_01 ~]$ NEW_JAVA="/app/Java/jdk1.8.0_192"
[weblogic@weblogic_server_01 ~]$
[weblogic@weblogic_server_01 ~]$ grep -R "${OLD_JAVA}" $DOMAIN_HOME | grep -vE "^[^:]*.log:|^[^:]*/logs/|^[^:]*/nohupLogs/|^[^:]*/.patch_storage/|^Binary file " > ~/listJavaBefore_DOMAIN_HOME
[weblogic@weblogic_server_01 ~]$
[weblogic@weblogic_server_01 ~]$ grep -R "${OLD_JAVA}" $ORACLE_HOME | grep -vE "^[^:]*.log:|^[^:]*/logs/|^[^:]*/nohupLogs/|^[^:]*/.patch_storage/|^Binary file " > ~/listJavaBefore_ORACLE_HOME
[weblogic@weblogic_server_01 ~]$
[weblogic@weblogic_server_01 ~]$ awk -F':' '{print $1}' ~/listJavaBefore_DOMAIN_HOME | sort -u; echo
...

[weblogic@weblogic_server_01 ~]$
[weblogic@weblogic_server_01 ~]$ awk -F':' '{print $1}' ~/listJavaBefore_ORACLE_HOME | sort -u; echo
...

[weblogic@weblogic_server_01 ~]$
[weblogic@weblogic_server_01 ~]$ while read line; do FILE_TO_UPDATE=`echo ${line} | awk -F':' '{print $1}'`; sed -i "s,${OLD_JAVA},${NEW_JAVA},g" ${FILE_TO_UPDATE}; done < ~/listJavaBefore_DOMAIN_HOME
[weblogic@weblogic_server_01 ~]$
[weblogic@weblogic_server_01 ~]$ while read line; do FILE_TO_UPDATE=`echo ${line} | awk -F':' '{print $1}'`; sed -i "s,${OLD_JAVA},${NEW_JAVA},g" ${FILE_TO_UPDATE}; done < ~/listJavaBefore_ORACLE_HOME
[weblogic@weblogic_server_01 ~]$

 

The grep commands above list all occurrences of the $OLD_JAVA path, the awk commands list all the files that contain one or more occurrences and the while loop replace all occurrences with the value of the $NEW_JAVA. Once that’s done, you can check again with the grep command using both OLD_JAVA and NEW_JAVA to ensure that all references to the old JAVA_HOME are now all properly replaced with the new JAVA_HOME. Instead of OLD_JAVA, you can also use a regex, if you prefer, in case you had several Java versions in the past so it matches all of them and not just the last one. Something like ‘grep “${JAVA_BASE}/[a-zA-Z0-9._]*” …‘ for example or ‘grep “${JAVA_BASE}/jdk[0-9._]*” …

 

III. Use a generic folder

From my opinion, none of the above solutions matter because this one is the best solution and it should be the one that everybody uses. What I mean by a “generic folder” is simply a folder whose name never changes. For that purpose, I’m usually very happy with symlinks for other Application/Application Server but for WebLogic this needs to be a real folder.

In case of an upgrade, you will need to execute both this section III as well as the section II.b so that all references points to the generic folder. On the other hand, in case of a new installation, it’s just all about using the generic folder’s path while installing WebLogic and then you will never have anything else to do. You can see this section III as being somehow the opposite of the section I.

Below, I’m just taking the status at the section I where we have a symlink named “jdk” that point to “jdk1.8.0_192“. I’m changing that, removing the symlink, renaming the folder “jdk1.8.0_151” into “jdk” and then creating a new symlink named “jdk1.8.0_151” that points to “jdk” so it’s the other way around. With that, we have a generic folder named “jdk” that can old all/any Java version and we will never have to change the JAVA_HOME again:

[weblogic@weblogic_server_01 Java]$ ls -l
total 0
lrwxrwxrwx. 1 weblogic weblogic 13 May 18 14:06 jdk -> jdk1.8.0_192/
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk1.8.0_151
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ $JAVA_HOME/bin/java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ rm jdk
[weblogic@weblogic_server_01 Java]$ mv jdk1.8.0_151 jdk
[weblogic@weblogic_server_01 Java]$ ln -s jdk/ jdk1.8.0_151
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ ls -l
total 0
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk
lrwxrwxrwx. 1 weblogic weblogic 13 May 18 16:22 jdk1.8.0_151 -> jdk/
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ $JAVA_HOME/bin/java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
[weblogic@weblogic_server_01 Java]$

 

As mentioned above, at this point, if you are going to install a new WebLogic Server, then just use “/app/Java/jdk” as the JAVA_HOME and you will never have to touch any WebLogic files. If you have already a WebLogic Server installed and using something else than “/app/Java/jdk“, then you will have to do the section II.b. Section II.b is designed to be executed each time you change your Java version but here, since the folder name will always stay the same, you can do it once only.

To upgrade your version of Java once you are using the generic folder, it’s that simple:

[weblogic@weblogic_server_01 Java]$ ls -l
total 0
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk
lrwxrwxrwx. 1 weblogic weblogic 13 May 18 16:22 jdk1.8.0_151 -> jdk/
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ $JAVA_HOME/bin/java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ rm jdk1.8.0_151
[weblogic@weblogic_server_01 Java]$ mv jdk jdk1.8.0_151
[weblogic@weblogic_server_01 Java]$ mv jdk1.8.0_192 jdk
[weblogic@weblogic_server_01 Java]$ ln -s jdk/ jdk1.8.0_192
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ ls -l
total 0
drwxr-xr-x. 1 weblogic weblogic 17 May 18 14:02 jdk
drwxr-xr-x. 1 weblogic weblogic 17 May 18 13:55 jdk1.8.0_151
lrwxrwxrwx. 1 weblogic weblogic 13 May 18 16:28 jdk1.8.0_192 -> jdk/
[weblogic@weblogic_server_01 Java]$
[weblogic@weblogic_server_01 Java]$ $JAVA_HOME/bin/java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)
[weblogic@weblogic_server_01 Java]$

 

Once that’s done, simply restart the WebLogic processes and you are done, Java has been upgraded without effort, the WebLogic files are still referencing the correct JAVA_HOME and it will stay clean, efficient and working, forever.

 

Cet article WebLogic – Upgrade of Java est apparu en premier sur Blog dbi services.

Pages