paint-brush
How to Migrate from Oracle to PostgreSQLby@intelligentconverters
32,948 reads
32,948 reads

How to Migrate from Oracle to PostgreSQL

by Intelligent ConvertersMay 11th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

This whitepaper covers experience of database migration from Oracle to PostgreSQL for a large cargo delivery company. The key reason of migration was reducing cost of the database management system. There were 3 specialists involved in the migration: 2 developers and 1 database administrator. The migration includes six basic phases: Find all Oracle-specific methods of storing and processing data.
featured image - How to Migrate from Oracle to PostgreSQL
Intelligent Converters HackerNoon profile picture
0-item
1-item

This whitepaper covers experience of database migration from Oracle to PostgreSQL made by Intelligent Converters for a large cargo delivery company. The key reason of migration was reducing cost of the database management system and transferring to open-source system with sufficient scalability, security and customization capabilities.


Here is brief specification of the source database:


  • Database server Oracle 12g

  • Database contains 190 tables, 50 GB of data

  • 120000 lines of PL/SQL in stored procedures, functions and triggers


The project duration was 2 months: a half month of assessment and planning, one month for migration and a half month for testing. There were 3 specialists involved in the migration: 2 developers and 1 database administrator.


This Oracle to PostgreSQL database migration includes six basic phases:

  1. Find all Oracle-specific methods of storing and processing data in the source database and the scope of use (investigation and planning phase)
  2. Select the appropriate tools for schema migration and implement it
  3. Choose the most suitable method of data migration to decrease Oracle system downtime
  4. Run the data migration handling all transformations required by PostgreSQL DBMS
  5. Convert all PL/SQL code into PostgreSQL format (using tools for partial automaton and manual post-processing)
  6. Run performance and functional tests, fine-tuning of the resulting database

Migration of Table Definitions

There are some data types in Oracle having no direct equivalent in PostgreSQL. One of such types is DATE containing both date and time parts.


PostgreSQL supports:

  • date – pure date without time part
  • time – pure time without date part with time zone optional specification
  • timestamp – date and time with time zone optional specification


There are two options of mapping Oracle dates into PostgreSQL: either use TIMESTAMP or set up orafce extension to use Oracle-style date type oracle.date.


Spatial types also require special attention. Oracle has built-in type SDO_GEOMETRY while PostgreSQL needs PostGIS installation to work with spatial data types.


The table below illustrates Oracle to PostgreSQL safe type mapping:

Oracle

PostgreSQL

BINARY_FLOAT

REAL

BINARY_INTEGER

INTEGER

BINARY_DOUBLE

DOUBLE PRECISION

BLOB, RAW(n), LONG RAW

BYTEA (1GB limit)

CLOB, LONG

TEXT (1GB limit)

DATE

TIMESTAMP

NUMBER, NUMBER(*)

DOUBLE PRECISION or BIGINT if it is a part of Primary Key

NUMBER(n,0), NUMBER(n)

n<5 – SMALLINT5<=n<9 – INT9<=n<19 – BIGINTn>=19 – DECIMAL(n)

NUMBER(p,s)

DECIMAL(p,s)

REAL

DOUBLE PRECISION


For numeric types it is important to understand the scope of use in database. If it is focused on accuracy, Oracle numeric types must be mapped in PostgreSQL NUMERIC. If the top priority is calculation speed, the best mapping would be REAL or DOUBLE PRECISION.


We used Oracle-to-PostgreSQL converter to automate migration of table definitions, indexes and constrains for this project. It maps Oracle types into the most appropriate PostgreSQL equivalents and allows to customize particular type mapping.

Data Migration

Since data migration may consume much time for large databases, it is extremely important to choose right strategy and tools for this step.


There are three common approaches to the data migration:

  • snapshot – migrate all data at one step
  • piecewise snapshot – migrate data by chunks in parallel threads or processes
  • change data replication – continuous loading data by tracking incremental changes


Snapshot method requires essential downtime of the source database for the entire period of reading data to avoid data loss or corruption during migration. The downtime for piecewise snapshot approach is much less but still required. In this migration project we used piecewise snapshot method for migration of large tables containing millions of rows.


So, what about change data replication (CDR) technique?

It has two major implementations and each of them has its weakness. First one is known as trigger-based CDR. It requires creating triggers on insert, updates and delete for each table being replicated in the source database. Those triggers track all changes by storing information about changing events into special ‘history’ table. Based on this data the CDR tool replicates all changes to the target database. Trigger-based CDR causes extra overhead of the source database due to writing into ‘history’ table for every data update.


The second method is called Transaction Log CDR. It uses transaction logs created by Oracle DBMS to track and replicate changes into the target PostgreSQL database. The advantage of this approach compared to trigger-based CDR is that it does not modify the source database.


However, Transaction Log CDR has some weaknesses as well:

  • Oracle constantly archives the transaction log files, so CDC tool may lose some changes if does not read transaction log before it is archived.
  • In case of losing connection to the target database while replicating changes from transaction log (or any other kind of error), data may be lost or damaged due to lack of control over the transaction log.


The most challenging aspects of data migration from Oracle to PostgreSQL are: specific data formats of the source DBMS having no direct equivalent in the target and the external data.


In the previous section of this article BYTEA is stated as the most suitable PostgreSQL data type for binary data. However, when migrating large binary data (average field size is not less than 10MB), it is not recommended to use BYTEA. The reason is particularity of reading BYTEA data – it only can be extracted in one fragment; piecewise reading is not possible. This may cause essential RAM overhead. PostgreSQL type LARGE OBJECT may be used as workaround for this issue. All values of LARGE OBJECT type are stored in the system table ‘pg_largeobject’ that is a part of each database. There can be up to 4 billion of rows in table ‘pg_largeobject’. Max size of LARGE OBJECT is 4TB and piecewise reading is available.


Also, it is important to properly migrate ROWID that is pseudo-column identifying physical address of record in the table. PostgreSQL has a similar service field called ctid, however it is a direct equivalent of ROWID. PostgreSQL documentation says that ctid might be changed due to the vacuuming procedure.


There are three basic methods of emulating ROWID functionality in PostgreSQL:

  • Use existing primary key (or create new one) to identify rows instead of ROWID
  • Add a serial or bigserial column with auto-generated values and make it a primary/unique key to replace ROWID functionality
  • When it is not possible to define a single column primary key, use unique index built over multiple columns (for this purpose determine minimal unique set of fields for every row)


Oracle may link external data that is stored outside the database. However, it can be processed as a regular table using ‘external table’ feature of the DBMS. PostgreSQL uses Foreign Data Wrapper library for the same purpose. For example, it provides extension ‘file_fdw’ to work with external CSV files as a regular table.

Challenges of PL/SQL Migration

This section explores issues that was resolved during migration of Oracle packages, stores procedures, functions and triggers to PostgreSQL.


Packages. PostgreSQL does not have packages. This missing feature can be emulated by grouping all entries belong to one package inside PostgreSQL schema with the same name, global variables can be stored in a service table.


Empty strings. Unlike Oracle, in PostgreSQL empty strings are not equal to NULL. For example, in Oracle:

SQL> select NVL('','This is null') as null_test from dual;

NULL_TEST
------------
This is null

The same in PostgreSQL:

testdb=# select coalesce('','This is null') as null_test;
 null_test
-----------

(1 row)

Also, there is a different behavior of string concatenation with NULL in Oracle and PostgreSQL. In Oracle:

SQL> Select 'String' || NULL as cs from dual;

CS
--
String

In PostgreSQL:

synctest=# Select 'String' || NULL as concat_str;
 concat_str
------------

(1 row) 


Types casting. PostgreSQL requires strict types casting when calling functions, operators or when INSERT and UPDATE statements place the results of expressions into a table.


The workaround is to use pseudo-type ‘anyelement’:

create or replace function my_concat(str1 anyelement, str2 anyelement)
returns varchar 
language plpgsql 
as $$
begin
return str1::varchar || str2::varchar;
end;
$$;

Note: you can either use one anyelement argument in stored procedure/function or all parameters of the same type anyelement.


Sequences. Syntax of sequence declaration is similar for Oracle and PostgreSQL. For example:

CREATE SEQUENCE SEQ1 START WITH 10 INCREMENT BY 2 MAXVALUE 1000000 CACHE 20 NO CYCLE;


However, referencing the next value of sequence is different. In Oracle it is ‘sequence.nextval’ and in PostgreSQL nextval(‘sequence’).


Triggers. In Oracle the source code of trigger’s body in included in CREATE TRIGGER statement. PostgreSQL requires that trigger source code is compose as standalone function with the reference from CREATE TRIGGER statement:

CREATE OR REPLACE FUNCTION store_changes()
  RETURNS TRIGGER 
  LANGUAGE PLPGSQL
  AS
$$
BEGIN
	IF (NEW.first_name <> OLD.first_name) OR (NEW.last_name <> OLD.last_name) OR (NEW.email <> OLD.email)
THEN
		 INSERT INTO changes_log(id,changed_on)
		 VALUES(OLD.id,now());
	END IF;
	RETURN NEW;
END;
$$

CREATE TRIGGER make_changes
  BEFORE UPDATE
  ON employees
  FOR EACH ROW
  EXECUTE PROCEDURE store_changes();


Autonomous Transactions. Autonomous transactions in Oracle allow a subprogram to commit or rollback SQL operations without committing or rolling-back the main transaction. For example, some data is inserted into a table as a part of insert-trigger.


This insert operation must succeed even if main transaction of insert-trigger is failed. In this case the corresponding INSERT statement must be enclosed in autonomous transaction:

CREATE OR REPLACE PROCEDURE insert_critical_data(v_data varchar2)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
    INSERT INTO info_table (data) VALUES (v_data);
    commit;
END;


Autonomous Transactions are not supported in PostgreSQL directly. One possible workaround is to refactor the code, so it does not need autonomous transactions.


Otherwise, you can use module ‘dblink‘ of PostgreSQL. With dblink, a new connection is established, and a query is executed using this connection and is immediately committed, irrespective of the main transaction. For example, the following function inserts a row into a table, and the insert operation will be committed even if the calling transaction is rolled back:


CREATE OR REPLACE FUNCTION insert_critical_data(v_data TEXT)
RETURNS void
AS
$BODY$
DECLARE
    v_sql text;
BEGIN
    PERFORM dblink_connect('myconn', 
         'dbname=mydbname user=… password=… host=… port=…');
    v_sql := format('INSERT INTO info_table (data) 
         VALUES (%L)', v_data);
    PERFORM dblink_exec('myconn', v_sql);
    PERFORM dblink_disconnect('myconn');
END;
$BODY$
LANGUAGE  plpgsql;


Built-in Functions. Oracle and PostgreSQL provide similar but not equal sets of built-in functions. The table below includes Oracle functions that require porting into PostgreSQL equivalents:

Oracle

PostgreSQL

ADD_MONTH($date,$n_month)

$date + $n_month  * interval '1 month'

DECODE($exp, $when, $then, ...)

CASE $exp WHEN $when THEN $then ... END

INSTR($str1, $str2)

*POSITION($str2 in $str1)

ROWNUM

**row_number() over ()

SYSDATE

CURRENT_DATE

SYS_GUID()

uuid_generate_v1()

*Complex porting of Oracle INSTR function to PostgreSQL can be found at: https://www.postgresql.org/docs/9.0/static/plpgsql-porting.html#PLPGSQL-PORTING-APPENDIX


Oracle condition ‘where rownum < N’ must be converted into ‘limit N’ in PostgreSQL We used Oracle-to-PostgreSQL Code Converter to partially automate conversion of triggers, stored procedures and functions. It supports conversion of Oracle built-in functions into PostgreSQL equivalents and processes most syntax patterns of PL/SQL. However, complex source code may require manual post-processing of the output.

Useful Modules for Oracle to PostgreSQL Migration

This section contains brief description of PostgreSQL modules that can help to migrate from Oracle to PostgreSQL and test the results of migration.


One of the most important PostgreSQL modules when migrating from Oracle is orafce. It emulates specific types, functions and operators of Oracle database management system.


Extensions ‘pgTAP’ and ‘pg_prove’ can be used for functional testing of PostgreSQL functions. After installing ‘pgTAP’, it creates a lot of stored functions used for writing tests. Test function must be declared as ‘returns setof text’. Then it can be run from psql console client:

psql -h $db_host -p $db_port -U $db_user $db_name -f tests/name_of_test.sql


Detailed information about pgTAP can be found here: http://pgtap.org/documentation.html


PostgreSQL extension plpgsql_check helps to validate PL/pgSQL code. For example, if there is missing variable declaration or misspelled column name inside PostgreSQL function, this will definitely fail during execution. Use capabilities of ‘plpgsql_check’ to diagnose such issues before going to production:

select * from plpgsql_check_function_tb('{name of function}');


Module plprofiler helps to analyze performance of PL/pgSQL code in functions and stored procedures.


PostgreSQL extension oracle_fdw provides connection to Oracle databases through Foreign Data Wrapper feature of DBMS. For example, if Oracle database ‘OCRL’ is running on the server ‘server.mydomain.com’, oracle_fdw must be configured as follows:


CREATE EXTENSION oracle_fdw;

CREATE SERVER oraserver FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//server.mydomain.com:1521/ORCL');

GRANT USAGE ON FOREIGN SERVER oraserver TO pguser;

CREATE USER MAPPING FOR pguser SERVER oraserver OPTIONS (user 'oracle user', password 'oracle password');

CREATE FOREIGN TABLE oratable ( { column definitions } ) SERVER oraserver OPTIONS (schema 'ORAUSER', table 'ORATAB');


Then, you can use ‘oratable’ as a regular PostgreSQL table.