Question :
We will get several thousands products in our database. The problem is that one product can have several different names e.g. “iPad 16GByte UMTS schwarz” or “iPad 16GB 3G black” and so on. But they are the same product. We would like to combine them together for an exact analysis (like how many products in which city in a specific period of time). What would be the best approach? What are the best experts for this?
Answer :
I’m not sure how you would do this with a nosql database like mongodb but in a typical data warehouse I would do this with the dimension table hierarchy.
Product_ID Full_Product_Nm Product_Lvl1 Product_Lvl2
---------- --------------- -------------- ------------
1 iPad 16GByte UMTS schwarz iPad iPad 16gb
2 iPad 16GB 3G black iPad iPad 16gb
3 iPad 64GByte UMTS schwarz iPad iPad 64gb
4 iPad 64GB 3G black iPad iPad 64gb
Then I could join my fact table to the dimension and pull out the hierarchy level that I would like to report on.
So in your example I could pull out the Product_Lvl1 to get iPad but if I wanted to compare sales/inventory of 64gb compared to the 16gb iPad I could do that using the Product_Lvl2.
Sounds like to need to scrub this data and load it into a cube. SQL Server Integration Services has some methods which are very, very good at this sort of data scrubbing, and SQL Server Analysis Services is perfect for this sort of data aggregation. You can download the free trial for them from Microsoft’s site (www.microsoft.com/sql).