You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+60-98Lines changed: 60 additions & 98 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ Before initiating the build and installation process, ensure the following prere
31
31
1. Java version 21 or higher.
32
32
2. Maven 3.9 or higher.
33
33
3. Postgres version 15 or higher (to use for the pgCompare Data Compare repository).
34
-
4. Necessary JDBC drivers (Postgres, MySQL, MSSQL and Oracle currently supported).
34
+
4. Necessary JDBC drivers (DB2, Postgres, MySQL, MSSQL and Oracle currently supported).
35
35
5. Postgres connections must not go through a connection pooler like pgBouncer.
36
36
37
37
## Limitations
@@ -42,6 +42,21 @@ The following are current limitations of the compare utility:
42
42
2. Unsupported data types: blob, long, longraw, bytea.
43
43
3. Limitations with data type boolean when performing cross-platform compare.
44
44
45
+
## Upgrading
46
+
47
+
### Version 0.3.0
48
+
49
+
#### Enhancements / Fixes
50
+
51
+
- Added support for DB2.
52
+
- Support for case sensitive table and column names are now supported.
53
+
- Replaced json object used for column mapping with new tables for easier management.
54
+
- Added Projects to allow multiple configurations to be stored in the repository instead of managing multiple properties files.
55
+
56
+
#### Upgrading to 0.3.0
57
+
58
+
Due to the changes required for the repository, the **repository must be dropped and recreated** to upgrade to version 0.3.0.
59
+
45
60
## Compile
46
61
Once the prerequisites are met, begin by forking the repository and cloning it to your host machine:
47
62
@@ -63,6 +78,8 @@ Copy the `pgcompare.properties.sample` file to pgcompare.properties and define t
63
78
64
79
By default, the application looks for the properties file in the execution directory. Use the PGCOMPARE_CONFIG environment variable override the default and point to a file in a different location.
65
80
81
+
At a minimal the repo-xxxxx parameters are required in the properties file (or specified by environment parameters). Besides the properties file and environment variables, another alternative is to store the property settings in the `dc_project` table. Settings can be stored in the `project_config` column in JSON format ({"parameter": "value"}).
82
+
66
83
## Configure Repository Database
67
84
68
85
pgCompare necessitates a hosted Postgres repository. To configure, connect to a Postgres database and execute the provided pgCompare.sql script in the database directory. The repository may also be created using the `--init` flag.
@@ -79,45 +96,47 @@ In the database directory are two scripts to deploy a sample table (EMP) to Orac
79
96
80
97
## Defining Table Mapping
81
98
82
-
The initial step involves defining a set of tables to compare, achieved by inserting rows into the `dc_table` within the pgCompare repository.
83
-
84
-
dc_table:
85
-
- source_schema: Schema/user that owns the table on the source database.
86
-
- source_table: Table name on the source database.
87
-
- target_schema: Schema/user that owns the table on the target database.
88
-
- target_table: Table name on the target database.
89
-
- table_filter: Specify a valid predicate that would be used in the where clause of a select sql statement.
90
-
- parallel_degree: Data can be compared by splitting up the work among many threads. The parallel_degree determines the number of threads. To use parallel threads, the mod_column value must be specified.
91
-
- status: Expected values are 'disabled', which is the default, and 'ready'.
92
-
- batch_nbr: Tables can be grouped into batches and compare jobs executed a batch, or grouping of tables.
93
-
- mod_column: Used in conjunction with the parallel_degree. The work is divided up among the threads using a mod of the specified column. Therefore, the value entered must be a single column with a numeric data type.
94
-
- column_map: Used to review or override column mapping used by compare functions. See Column Map section for more details.
99
+
The initial step involves defining a set of tables to compare, achieved by inserting rows into the `dc_table` and `dc_table_map` tables in the pgCompare repository. This is best done using the automated process below.
95
100
96
101
### Automated Table Registry
97
102
98
-
Use pgCompare to perform a discovery against the target database and populate the dc_table with the results using the following command (where hr is the schema to be scanned).
103
+
Use pgCompare to perform a discovery against the target database and populate the dc_table with the results using the following command. The schemas specified in the properties file will be used for the discovery process.
99
104
100
105
```shell
101
-
java -jar pgcompare.jar --discovery hr
106
+
java -jar pgcompare.jar --discover
102
107
```
103
108
104
-
After automatic table registry, if there are tables that are case sensistive, those table names will need to be modified in the dc_table.source_table and dc_table.target_table columns as approriate.
105
-
106
109
### Manual Table Registry
107
110
108
-
Example of loading a row into `dc_table`:
111
+
Example of loading a row into `dc_table` and `dc_table_map`:
109
112
110
113
```sql
111
-
INSERT INTO dc_table (source_schema, source_table, target_schema, target_table, parallel_degree, status, batch_nbr)
112
-
VALUES ('hr','emp','hr','emp',1,'ready',1);
114
+
INSERT INTO dc_table (table_alias)
115
+
VALUES ('emp');
116
+
117
+
INSERT INTO dc_table_map (tid, dest_type, schema_name, table_name)
118
+
VALUES (1, 'source', 'hr', 'emp');
119
+
120
+
INSERT INTO dc_table_map (tid, dest_type, schema_name, table_name)
121
+
VALUES (1, 'target', 'HR', 'EMP');
122
+
```
123
+
124
+
After populating the list of tables, run the following to automatically map columns.
125
+
126
+
```shell
127
+
java -jar pgcompare.jar --batch=0 --maponly
113
128
```
114
129
130
+
### Projects
131
+
132
+
Projects allow for the repository to maintain different mappings for different compare objectives. This allows a central pgCompare repository to be used for multiple compare projects. Each table has a `pid` column which is the project id. If no project is specified, the default project (pid = 1) is used.
133
+
115
134
## Perform Data Compare
116
135
117
136
With the table mapping defined, execute the comparison and provide the mandatory batch command line argument:
118
137
119
138
```shell
120
-
java -jar pgcompare.jar --batch=0
139
+
java -jar pgcompare.jar --batch0
121
140
```
122
141
123
142
Using a batch value of 0 will execute the action for all batches. The batch number may also be specified using the environment variable PGCOMPARE-BATCH. The default value for batch number is 0 (all batches).
@@ -127,7 +146,7 @@ Using a batch value of 0 will execute the action for all batches. The batch num
127
146
If discrepancies are detected, run the comparison with the 'check' option:
128
147
129
148
```shell
130
-
java -jar pgcompare.jar --batch=0 --check
149
+
java -jar pgcompare.jar --batch0 --check
131
150
```
132
151
133
152
This recheck process is useful when transactions may be in flight during the initial comparison. The recheck only checks the rows that have been flagged with a discrepancy. If the rows still do not match, details will be reported. Otherwise, the rows will be cleared and marked in-sync.
@@ -136,83 +155,14 @@ This recheck process is useful when transactions may be in flight during the ini
136
155
137
156
## Column Map
138
157
139
-
The system will automatically generate a column mapping during the first execution on a table. This column mapping will be stored in the column_map column of the dc_table repository table. The column mapping is stored in the form of a JSON object. This mapping can be performed ahead of time or the generated mapping modified as needed. If a column map is present in column_map, the program will not perform a remap.
158
+
The system will automatically generate a column mapping during the first execution on a table. This column mapping will be stored in the `dc_table_column` and `dc_table_column_map`repository tables. This mapping can be performed ahead of time or the generated mapping modified as needed. If a column mapping is present, the program will not perform a remap unless instructed to using the `maponly` flag.
140
159
141
160
To create or overwrite current column mappings stored in column_map colum of dc_table, execute the following:
Only Primary Key columns and columns with a status equal to 'compare' will be included in the final data compare.
215
-
216
166
## Properties
217
167
218
168
Properties are categorized into four sections: system, repository, source, and target. Each section has specific properties, as described in detail in the documentation. The properties can be specified via a configuration file, environment variables or a combination of both. To use environment variables, the environment variable will be the name of hte property in upper case prefixed with "PGCOMPARE-". For example, batch-fetch-size can be set by using the environment variable PGCOMPARE-BATCH-FETCH-SIZE.
@@ -221,13 +171,15 @@ Properties are categorized into four sections: system, repository, source, and t
221
171
- batch-fetch-size: Sets the fetch size for retrieving rows from the source or target database.
222
172
- batch-commit-size: The commit size controls the array size and number of rows concurrently inserted into the dc_source/dc_target staging tables.
223
173
- batch-progress-report-size: Defines the number of rows used in mod to report progress.
174
+
- database-source: Determines if the sorting of the rows based on primary key occurs on the source/target database. If set to true, the default, the rows will be sorted before being compared. If set to false, the sorting will take place in the repository database.
224
175
- loader-threads: Sets the number of threads to load data into the temporary tables. Default is 4. Set to 0 to disable loader threads.
176
+
- log-level: Level to determine the amount of log messages written to the log destination.
177
+
- log-destination: Location where log messages will be written. Default is stdout.
225
178
- message-queue-size: Size of message queue used by loader threads (nbr messages). Default is 100.
226
179
- number-cast: Defines how numbers are cast for hash function (notation|standard). Default is notation (for scientific notation).
227
180
- observer-throttle: Set to true or false, instructs the loader threads to pause and wait for the observer thread to catch up before continuing to load more data into the staging tables.
228
181
- observer-throttle-size: Number of rows loaded before the loader thread will sleep and wait for clearance from the observer thread.
229
182
- observer-vacuum: Set to true or false, instructs the observer whether to perform a vacuum on the staging tables during checkpoints.
230
-
- stage-table-parallel: Sets the number of parallel workers for the temporary staging tables. Default is 0.
231
183
232
184
### Repository
233
185
- repo-dbname: Repository database name.
@@ -243,9 +195,9 @@ Properties are categorized into four sections: system, repository, source, and t
243
195
- source-database-hash: True or false, instructs the application where the hash should be computed (on the database or by the application).
244
196
- source-dbname: Database or service name.
245
197
- source-host: Database server name.
246
-
- source-name: User defined name for the source.
247
198
- source-password: Database password.
248
199
- source-port: Database port.
200
+
- source-schema: Name of schema that owns the tables.
249
201
- source-sslmode: Set the SSL mode to use for the database connection (disable|prefer|require)
250
202
- source-type: Database type: oracle, postgres
251
203
- source-user: Database username.
@@ -255,13 +207,23 @@ Properties are categorized into four sections: system, repository, source, and t
255
207
- target-database-hash: True or false, instructs the application where the hash should be computed (on the database or by the application).
256
208
- target-dbname: Database or service name.
257
209
- target-host: Database server name.
258
-
- target-name: User defined name for the target.
259
210
- target-password: Database password.
260
211
- target-port: Database port.
212
+
- target-schema: Name of schema that owns the tables.
261
213
- target-sslmode: Set the SSL mode to use for the database connection (disable|prefer|require)
262
214
- target-type: Database type: oracle, postgres
263
215
- target-user: Database username.
264
216
217
+
## Property Precedence
218
+
219
+
The system contains default values for every parameter. These can be over-ridden using environment variables, properties file, or values saved in the `dc_project` table. The following is the order of precedence used:
220
+
221
+
- Default values
222
+
- Properties file
223
+
- Environment variables
224
+
- Settings stored in `dc_project` table
225
+
226
+
265
227
# Data Compare Concepts
266
228
267
229
pgCompare stores a hash representation of primary key columns and other table columns, reducing row size and storage demands. The utility optimizes network traffic and speeds up the process by using hash functions when comparing similar platforms.
@@ -282,7 +244,7 @@ A summary is printed at the end of each run. To view results at a later time, t
0 commit comments