Question :
I’m a developer on a reporting application that runs complex user-defined queries, stores the results, and then sends the result to Excel generation, web-based tables, and graphs, and so on. Currently, we’re storing the entire result set serialized as a clob in an existing table.
These result sets may be large (some >1GB) and expensive to generate (many minutes). Regenerating them on-the-fly is not an option. They also are heterogeneous — each result set will likely have very different columns. The result sets are immutable — they will never be modified, although they will eventually be deleted. Each result set will be read occasionally, but often in bursts (e.g. if someone views the data via our web interface, then we expect many more requests soon).
Currently, these thousands of result sets are taking up a lot of DB space. In addition, it is difficult to manipulate the values other than to select the entire result set or the first n records. We would like to be able to sort and filter the result set. It seems logical that if we have tabular data that we’d like to sort and filter, then storing it as a table is the obvious answer.
However, I’m not sure what the right way would be to do this in Oracle. The naive solution would be to dynamically run CREATE TABLE
statements for every one, naming them from a GUID or something. This kind of vaguely feels like an unwise thing to do, maybe unless we also move this data to a separate database. I looked up global temporary tables, but we need them to persist across sessions. A co-worker suggested using external tables, but I’m not sure I understand the performance implications of that. Basically, I’m in over my head from an Oracle architecture standpoint, and I’d appreciate advice from more experienced developers and DBAs.
So: given these constraints, what approaches should I consider?
Answer :
You could use a data-defined schema, which isn’t terrific for performance or manageability but comes closer to doing what the database was designed to do.
CREATE TABLE report_result_tab (
report_result_id NUMBER NOT NULL,
row_id NUMBER NOT NULL,
column_name VARCHAR2(30) NOT NULL,
value_number NUMBER,
value_varchar VARCHAR2(4000));
ALTER TABLE report_result_tab
ADD CONSTRAINT report_result_pk
PRIMARY KEY (report_result_id, row_id, column_name);
With a table like that, you get a variable number of “columns” by putting the column as part of the key.
Let’s say we’re reporting on a table like this:
CREATE TABLE employee (
employee_id NUMBER,
last_name VARCHAR2(255));
Then to report on it, with one report having two records, it would look like this:
INSERT INTO report_result_tab (
report_result_id,
row_id,
column_name,
value_number,
value_varchar)
VALUES (
0, /* report_result_id */
0, /* row_id */
'EMPLOYEE_ID', /* column_name */
37, /* value_number */
NULL); /* value_varchar */
INSERT INTO report_result_tab (
report_result_id,
row_id,
column_name,
value_number,
value_varchar)
VALUES (
0, /* report_result_id */
0, /* row_id */
'LAST_NAME', /* column_name */
NULL, /* value_number */
'Smith'); /* value_varchar */
INSERT INTO report_result_tab (
report_result_id,
row_id,
column_name,
value_number,
value_varchar)
VALUES (
0, /* report_result_id */
1, /* row_id */
'EMPLOYEE_ID', /* column_name */
42, /* value_number */
NULL); /* value_varchar */
INSERT INTO report_result_tab (
report_result_id,
row_id,
column_name,
value_number,
value_varchar)
VALUES (
0, /* report_result_id */
1, /* row_id */
'LAST_NAME', /* column_name */
NULL, /* value_number */
'Jones'); /* value_varchar */
You’ll have to figure out what to put as the COLUMN_NAME
in the base data. You could use a numeric COLUMN_ID
which would save space. In either case, you’ll also have to figure out a mapping to get the user-friendly column names and their sort order.
The method above is advantageous for sparse data, but keeping so many copies of the column IDs is expensive.
I’ve also seen a much uglier version which could potentially be cheaper on disk space. You could define a particular report type, which is nothing more than a collection of column metadata, then build your table to handle the maximum number of columns of each type.
CREATE TABLE report_result_tab (
report_result_id NUMBER NOT NULL,
row_id NUMBER NOT NULL,
n00 NUMBER,
n01 NUMBER,
n02 NUMBER,
n03 NUMBER,
n04 NUMBER,
/* ... */
s00 VARCHAR2(4000),
s01 VARCHAR2(4000),
s02 VARCHAR2(4000),
s03 VARCHAR2(4000),
s04 VARCHAR2(4000),
/* ... */
d00 DATE,
d01 DATE,
d02 DATE,
d03 DATE,
d04 DATE);
INSERT INTO report_result_tab (
report_result_id,
row_id,
n00,
n01,
n02,
n03,
n04,
/* ... */
s00,
s01,
s02,
s03,
s04,
/* ... */
d00,
d01,
d02,
d03,
d04)
VALUES (
0, /* report_result_id */
0, /* row_id */
37, /* n00 */
NULL, /* n01 */
NULL, /* n02 */
NULL, /* n03 */
NULL, /* n04 */
/* ... */
'Smith', /* s00 */
NULL, /* s01 */
NULL, /* s02 */
NULL, /* s03 */
NULL, /* s04 */
/* ... */
NULL, /* d00 */
NULL, /* d01 */
NULL, /* d02 */
NULL, /* d03 */
NULL); /* d04 */
INSERT INTO report_result_tab (
report_result_id,
row_id,
n00,
n01,
n02,
n03,
n04,
/* ... */
s00,
s01,
s02,
s03,
s04,
/* ... */
d00,
d01,
d02,
d03,
d04)
VALUES (
0, /* report_result_id */
0, /* row_id */
42, /* n00 */
NULL, /* n01 */
NULL, /* n02 */
NULL, /* n03 */
NULL, /* n04 */
/* ... */
'Jones', /* s00 */
NULL, /* s01 */
NULL, /* s02 */
NULL, /* s03 */
NULL, /* s04 */
/* ... */
NULL, /* d00 */
NULL, /* d01 */
NULL, /* d02 */
NULL, /* d03 */
NULL); /* d04 */