Using On-Premise SQL Data Warehouse Data for OLTP applications

Posted on

Question :

Quick Information: Our EDW group has created On-Premise SQL Data Warehouse by collecting information from several systems. These SQL DW databases are made up of:

  • SQL Version: SQL Server 2016 (SP2)
  • Partitioned by: Year
  • Total Database Size: 10TB
  • For HA: Utilizing Always On (1-Primary & 1-Secondary)
  • Environment: VMWare
  • Change Tracking Enable on Couples of tables under EDW

EDW developers have created Views and StoredProcedures for end user (i.e. BI Analytics, Application…) to consume the data, this way, whatever changes or modification EDW group is doing to underlying objects on their end is not directly affecting end user. Application Developer (i.e. OLTP App) group who are consuming these Views/Storedprocedures, so they are not allowed to make changes to these EDW objects nor are allow to create adh-hoc queries from OLTP application, as it is also very critical for EDW group to get their load times done under certain hours. We are looking at these Views and Stored procedures, to see if we can tune them for performance.

So far whatever I have read online I see that, it is not a good practice to utilized EDW for OLTP load as, OLTP application are chatty and they mostly need is recent data and not so much of historical data.

Question: What are your all suggestions on, how can we utilize our current EDW data for these OLTP applications, so that, we are not sacrificing Load Times for Performance and vice-versa?

Answer :

Can you have your cake and eat it, too?


But there are some things you can do. I’ve worked with this EHR software called Nextgen. It has an OLTP database that never deletes/archives any data, so it’s also the data warehouse.

Short story: it sucks.

less-short story: You can keep things kinda-not-slow if you cluster every table by some kind of date, so the most recent data is on top.

You also have to pay attention to DUI’s, making sure they don’t lock tables for too long at a time. I would sometimes resort to cursors, which made the overall duration of the DUI longer, but reduced the time of each individual lock.

I also avoided triggers, using jobs running every 5-15 minutes instead, whenever possible.

Here’s the script I used to get the whole database at once:

    /*this script will find all the heaps in the database that have the columns you specify, then index them on those columns and write a record to jm_clusterize
    It was written for sql server enterprise edition, taking advantage of online indexing, so if you have trouble, get rid of 'with (online = on)
    You should run this script after hours. It will *try* to do the index online, but if it can't, it will do the index offline, which will lock the table for a
    few seconds/minutes

    declare @table varchar(255),
    @sqlcreate nvarchar(max),
    @sqlDrop nvarchar(max),
    @column1 varchar(255),
    @column2 varchar(255),
    @column3 varchar(255)

    /*specify the coulumns you want to cluster on here. You always want to cluster on the create timestamp first, and nextgen is pretty good 
    at having a create_timestamp on every table, but you will want to cluster some tables on seq_no, or uniq_id, then patient demographics tables will need to
    use person_id first. I'm on the fence on clustering on 3 columns - it may be faster, it may be slower. 

    set @column1 = 'create_timestamp'
    set @column2 = 'enc_id'
    --set @column3 = 'seq_no'

    --drop table jm_clusterize
    if not exists (select name from sys.objects where name = 'jm_clusterize')
    create table jm_clusterize (table_name varchar(255), column1 varchar(255), column2 varchar(255), column3 varchar(255), create_statement nvarchar(max),
    drop_statement nvarchar(max), create_timestamp datetime, modify_timestamp datetime)

    declare c cursor for

    --This is the query that will show you all the heaps that could be clustered on the columns you pick
    SELECT AS TableName
    FROM sys.tables AS TBL 
         INNER JOIN sys.schemas AS SCH
             ON TBL.schema_id = SCH.schema_id 
         INNER JOIN sys.indexes AS IDX
             ON TBL.object_id = IDX.object_id 
                AND IDX.type = 0 -- = Heap 
                inner join sys.columns sc1 on tbl.object_id = sc1.object_id
                inner join sys.columns sc2 on tbl.object_id = sc2.object_id
                --inner join sys.columns sc3 on tbl.object_id = sc3.object_id
                INNER JOIN 
        sys.partitions p ON idx.object_id = p.OBJECT_ID AND idx.index_id = p.index_id
        sys.allocation_units a ON p.partition_id = a.container_id
        where = @column1
        and = @column2
        --and = @column3
        group by
    ORDER BY SUM(a.total_pages) desc, TableName

    open c

    fetch next from c into @table

    while @@fetch_status = 0 

    set @sqlcreate = 'begin try create clustered index inx_jm_clstr on ' + @table + ' (' + @column1 + ','
    + @column2 
    --+ ',' + @column3 
    + ') with (online = on) end try
    begin catch create clustered index inx_jm_clstr on ' + @table + ' (' + @column1 + ','
    + @column2
    --+ ','+ @column3 
    + ') end catch'

    set @sqlDrop = 'drop index inx_jm_clstr on ' + @table

    insert into jm_clusterize (table_name, column1, column2, column3, create_statement, drop_statement, create_timestamp, modify_timestamp)

    select @table, @column1, @column2, @column3, @sqlcreate, @sqlDrop, current_timestamp, current_timestamp

    --comment out the line below and it won't actually create the index
    exec sp_executesql @sqlcreate

    fetch next from c into @table

    close c

    deallocate c

    select * from jm_clusterize
    where column1 = @column1
    and column2 = @column2
    and column3 = @column3

Leave a Reply

Your email address will not be published.