Question :
I have just started on researching the feasibility of using MariaDB’s columnstore for OLAP, and I find inserts are very slow. This is MariaDB 10.5 on a debian 10 system, just an elderly desktop with 8GB RAM. This is the table, a trigger and the timings:
MariaDB [test]> show create table analytics_testG
*************************** 1. row ***************************
Table: analytics_test
Create Table: CREATE TABLE `analytics_test` (
`id` int(11) DEFAULT NULL,
`str` varchar(50) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=utf8mb4
1 row in set (0.000 sec)
MariaDB [test]> show create trigger test_trgG
*************************** 1. row ***************************
Trigger: test_trg
sql_mode: STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
SQL Original Statement: CREATE DEFINER=`root`@`localhost` trigger test_trg before insert on analytics_test for each row
begin
set new.str=concat('Value: ',new.id);
end
character_set_client: utf8
collation_connection: utf8_general_ci
Database Collation: utf8mb4_general_ci
Created: 2021-01-07 11:14:28.81
1 row in set (0.000 sec)
MariaDB [test]> insert into analytics_test set id=1;
Query OK, 1 row affected (0.817 sec)
MariaDB [test]> insert into analytics_test set id=2;
Query OK, 1 row affected (0.560 sec)
MariaDB [test]> insert into analytics_test set id=3;
Query OK, 1 row affected (0.611 sec)
MariaDB [test]> select * from analytics_test;
+------+----------+
| id | str |
+------+----------+
| 1 | Value: 1 |
| 2 | Value: 2 |
| 3 | Value: 3 |
+------+----------+
3 rows in set (0.085 sec)
I think .5 sec for a simple insert is very slow – compare to the same table in innodb:
MariaDB [test]> show create table testG
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`id` int(11) DEFAULT NULL,
`str` varchar(50) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.000 sec)
MariaDB [test]> show create trigger test_trg1G
*************************** 1. row ***************************
Trigger: test_trg1
sql_mode: STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
SQL Original Statement: CREATE DEFINER=`root`@`localhost` trigger test_trg1 before insert on test for each row
begin
set new.str=concat('Value: ',new.id);
end
character_set_client: utf8
collation_connection: utf8_general_ci
Database Collation: utf8mb4_general_ci
Created: 2021-01-07 11:45:07.85
1 row in set (0.000 sec)
MariaDB [test]> insert into test set id=1;
Query OK, 1 row affected (0.010 sec)
Is there anything I need to do in order to make inserts perform better?
Answer :
Column-oriented database engines, not just MariaDB ColumnStore, are specifically optimized for read operations on large volumes of data. For this they use auxiliary data structures (like ColumnStore extent maps), query parallelism, compression etc. This comes at the cost of increased overhead for standard DML operations. This is often explicitly stated in various technical documents, like this quote from the InfiniDB technical overview:
Note that because InfiniDB is a column-oriented database, insert and delete statements will not normally run as fast on InfiniDB as they will on row-oriented databases (more blocks must be touched).
or this from the MariaDB architecture presentation (see slide 22):
DML Writes
- Slow compared to other engines
- INSERT is very slow compared to cpimport
Vendors often provide special tools for bulk loading data into columnar tables (e.g. the cpimport utility mentioned above) to address this problem. This fits well into the typical columnar engine use case, that of an OLAP system with (relatively) infrequent mass updates and heavy read-only workloads.
In other words, you should not judge a column-oriented database by its DML performance, because that’s not what it is designed for.