Selecting large data set on windows and linux

Posted on

Question :

I need to parse a large data set, something like:

mysql_query("SELECT * FROM largeTable")
while($r = fetch()) {
 //> Do something

I just need 1 row to be in memory at time, because I need to operate on single row each time.

The fact is that on windows at the end I see memory_get_peak_usage() = 500M while on debian it’s 4M

Is something wrong with my windows?

If this questions doesn’t belong here please move to stackoverflow thanks!

Answer :

Depends on your client libraries, the way your app handles memory, and what you’re doing inside the fetch.

Pulling back one row at a time is a very dangerous design pattern for database servers to handle. They have to materialize all your results first, then spoon feed them to you one row at a time while you slice & dice results. You might be holding locks or slowing down the database server. Instead, pull all of the data into memory on your app and do the slicing and dicing after you’ve let go of your locks, OR do your slicing and dicing in batches on the database server. Batch processing is way more efficient than row-by-row processing. In the DBA community, this is known as ReBAR – Row By Agonizing Row.

If you want to do this in an on-line transaction, then you probably have the wrong database design.

If you are trying to do this in batch, then select only the fields of interest to a flat file, or export to a flat file then process your data from the flat file. You could even split the output file to several files and run parallel tasks to process the data. This will at least free your database server resources during your processing.

This practice is not uncommon in data warehousing applications.

Leave a Reply

Your email address will not be published. Required fields are marked *