As seen in this question I can use the following sql command:
CREATE DATABASE newdb WITH TEMPLATE olddb;
In order to replicate a database into my machine. But in my case my new database only to contain a subset of the data used in the first database offered by custom queries.
Lets suppose that in my current db has an existing tables:
|- ohimesama: |-- id PK SERIAL |-- name VARCHAR(100) | |-oujisama: |-- id PK SERIAL |-- name VARCHAR(100) | |- ohimesama_oujisama_suki: |-- ouji_id FK oujisama |-- ohime_id FK ohimesama
And In my database I want to be able to keep only the records from the tables
oujisama that maches the current queries:
SELECT * from ohimesama where id in (SELECT ohimesama_oujisama_suki JOIN oujisama where oujisama="WILLIAM"); SELECT * from oujisama where id in (SELECT ohimesama_oujisama_suki JOIN oujisama where ohimesama="MARRIE");
My database is rather large (~100GB) so truncating or deleting rows from the whole database is rather time consuming. Do you have any Idea how to solve that problem?
As you can quess both of the databases are in the same server. The reason why I want to do that is because I want to keep a subset of data for software development.
Follow up: I have issues setting up the JDBC driver. I get the following error:
No suitable driver for jdbc:postgresql://
…though I have installed the PostgreSQL JDBC driver via gnu/linux
apt command and specified the correct directory.
I have been through the same problem recently and can comment on some attempts and my conclusions after a lot of research into it.
In my case I had to partition some data that would be distributed in different databases. But note that we are dealing with the same fundamental problem that is partial data migration.
The first problem is regarding data extraction,
pg_dump does not provide any dynamic and consistent partial data extraction mechanism.
You can do a full backup and try to erase data you don’t want to use, but as you have already realized, this is a costly task and demands a lot of runtime for large volumes of data. And depending on the complexity of your database structure, assembling the delete scripts can be quite difficult. What’s more, if your bank is constantly evolving this method is not reusable if you want to extract partial data later.
You will find many suggestions from people saying to use the
COPY feature, but all in all I can say that it is extremely limited to the complexity of this task. Because it would be necessary to assemble a data select from all tables consistently and then execute them in a single transaction. Again, since your bank is dynamic and constantly evolving, this method is not reusable either. Because you would have to rewrite all selects every time you were extracting partial data.
Usually the partial data migration comes from a specific table that will restrict the other data, we will name it as a restrictive table. That said we can consider some more possibilities.
Let’s assume a case where your primary bank has a table called user (restrictive table) with 100 records and you want to migrate all your bank records that link to the first 10 records in that table.
- We can only dump/restore the structure.
- We can generate a sql with the data from the 10 user table records
we need and insert it into the database.
- Then we can dump all data in simple sql disregarding the user table.
We can run this sql script without transaction block and let the key
constraint check do the dirty work.
- As in the previous case, this can take a long execution time
depending on the volume of data, but this solution is functional and
Note that your database cannot contain cyclic key problems (This is the condition to dump data only correctly).
Also note that the dependency level of your restrictive table must be the lowest minimum, otherwise an adaptation would have to be made to this methodology, which I will not write because the text is already getting too large.
That said, what really worked for me was using the Jailer tool. This tool is developed in java and using ORM it can map the key constraint set of your database and assemble a data extraction model as you configure it.
The output is a simple sql script with partial data consistent with the restrictive rules that have been set.
If you have any questions about its use I can help you understand and set up your configuration environment.
A similar question was asked in Stack Overflow.
I have issues setting up the JDBC driver
Jailer already ships several database JDBC drivers (PostgreSQL, MySQL, MSSQL, etc.).
They are located in the “lib” subdirectory. So there is no need to update unless there is an issue.
The PostgreSQL JDBC driver shipped with the current Jailer version should be new enough to handle the task (it’s this one:
After adding and naming your data model (jailerGUI –> “Database subsetting tool”), and click “Analyze Database” you get the “Connnect with Database” dialog, where you add a new connection string.
After hitting the “New” button, a dialog pops up where you select the database. Scroll down to “PostgresQL JDBC Driver 42.2.0” (the one shipped with Jailer v8.8.4 released 2019-08-20) select the line and hit the “OK” button.
You now get the dialog to enter your connection string, username, password, etc.
Here The JDBC Driver jar should be preselected (on Windows “libpostgresql-42.2.0.jre7.jar” or on Linux “lib/postgresql-42.2.0.jre7.jar”).
So you only need to fill out the last three rows of the dialog (your DB-URL, the Username and the Password).
When done you can test the connection by hitting the “Test Connection” button.
It should work except for possible restrictions from where the database accepts the connection (check –> pg_hba.conf).
If you still think you need the latest version of the JDBC driver go to https://jdbc.postgresql.org/ and download the latest PostgreSQL JDBC driver directly from the source (Currently: postgresql-42.2.6.jar; or one of the other versions for older JREs).
You may want to directly drop the driver in the “lib” sub-directory of the Jailer installation directory.