Question :
I have a badly designed table and to top that number of columns is increasing (~500)because each time a new machine is discovered.(eg: router,adapter windows server etc)..it will store the output of commands which we after discovering that machine which is handled by back-end.
For Example: We discover a docker running in a machine. We will get the containers and image details….saving those as Blueprints and its properties in the table with appropriate columns;
table contains 1 millions records. We frequently do joins because this particular table is our main table. As to normalize the database now is not feasible, Can anybody give a workaround or any suggestions(like partition or sharding).
Thanxin advance ..Comment if you need more details
Answer :
I think you should analyze the scenario from a different perspective, because it looks quite likely that (a) continuing to work with the current structure of the table in question might end up being (b) far less feasible than (c) redesigning it at all the corresponding tiers, i.e.:
- the conceptual (modeling and defining [i] the types of things that the table represents and [ii] the respective rules of the business context in which said types of things are involved with [ii] the appropriate precision);
- the logical (reflecting the types of things of interest in a database structure with new normalized elements and accurate data types, and setting up the proper constraints so that the data contained complies with the pertinent business rules); and
- the physical (establishing a proper indexing strategy based on the exact data manipulation tendencies, installing more powerful hardware, perhaps partitioning and sharding, upgrading the network bandwidth, etc.) levels of abstraction.
Keeping the present table structure (~500 columns!) demands approaching it (manipulation- and administration-wise, etc.) in non-efficient ways, and if you expect that the volume of data will grow considerably then the system will scale very poorly and could become inoperable. It might be much better to employ (1) all the effort devoted to (2) handling this non-optimal situation in (3) a major reorganization.
Of course, there well may be current aspects of the table that can be reused later (e.g., the quite valuable data, which probably needs a clean-up), but I would decidedly consider rearranging—at least partially, at the major hot spots— all the factors mentioned above, instead of putting up workarounds that are in fact palliatives. Better late than never.
You must bite the bullet and do major revamp of the dataset.
500 columns is untenable, and must be part of the revamp. Read this about how not to do it, plus suggestions on using JSON as an alternative. Have a few columns of info that is needed for fetching and filtering; throw most columns into a single JSON column.
Think about what queries you need. This important when deciding on which columns to keep separate and easily handled by MySQL instead of the application.
A million rows is too small for PARTITIONing
or sharding.
You have not described the dataset enough to say whether to have multiple tables (one for containers, one for servers, etc) linked together, and whether there is much opportunity for normalization, etc.
“Vertical partitioning” (two tables with a common PK) may be an option if some subset of the current ‘columns’ are ‘optional’.