I have a database with ‘books’ (short stories for children) and it would be extremely informative to have word counts of each word in the books.
I figured out how to get the word count for each word using:
SELECT SUM ( ROUND ( (LENGTH(pageText) - LENGTH (REPLACE (pageText, "Word", ""))) /LENGTH("Word") ) ) FROM pages WHERE bookID = id;
Which works wonderfully for counting the words. BUT it requires me to go through each book, and get each word out, and run it through that function (I have it saved as a Stored Procedure.)
I have a table that contains each word, with no duplicates.
My question: is there a way I can do some kind of “for each” loop on the Words table using my stored procedure?
ie. pass the stored procedure a book ID and a word and record the result. Doing EVERY word, for EVERY book. Thus saving me a LOT of manual time… Is this something I should even be doing from the DB side? Should I attempt it with PHP instead?
Honestly any input is greatly appreciated!
Create a second procedure that uses two nested cursors.
Cursors in stored procedures allow you to do a very non-SQL-like thing: iterate through a result set one row at a time, putting the selected column values into variables and doing things with them.
They are easily misused, since SQL, being declarative rather than procedural, should usually not need “for each”-type operations, but in this case, it seems like a valid application.
Once you get the hang of them, cursors are easy, but they do require a structured approach in their supporting code that isn’t always intuitive.
I recently provided some fairly standard “boilerplate” code for working with a cursor to call a stored procedure in an answer on Stack Overflow, and I’ll borrow very heavily from that answer, below.
Using a cursor requires some standard boilerplate code to surround it.
SELECT the values you want to pass, from wherever you’re getting them (which could be a temporary table, base table, or view, and can include calls to stored functions) and then call your existinf procedure with those values.
Here is a syntactically valid example of the necessary code, with comments to explain what each component is doing.
This example uses 2 columns to pass 2 values to the called procedure.
Note that there events that happen here are in a specific order for a reason. Variables have to be declared first, cursors have to be declared before their continue handlers, and loops have to follow all of those things.
You can’t do things out of order, so when you nest one cursor inside of another, you have to reset the procedure scope by nesting additional code inside
END blocks within the procedure body; for example, if you needed a second cursor inside the loop, you’d just declare it inside the loop, inside another
DELIMITER $$ DROP PROCEDURE IF EXISTS `my_proc` $$ CREATE PROCEDURE `my_proc`(arg1 INT) -- 1 input argument; you might need more or fewer BEGIN -- declare the program variables where we'll hold the values we're sending into the procedure; -- declare as many of them as there are input arguments to the second procedure, -- with appropriate data types. DECLARE val1 INT DEFAULT NULL; DECLARE val2 INT DEFAULT NULL; -- we need a boolean variable to tell us when the cursor is out of data DECLARE done TINYINT DEFAULT FALSE; -- declare a cursor to select the desired columns from the desired source table1 -- the input argument (which you might or might not need) is used in this example for row selection DECLARE cursor1 -- cursor1 is an arbitrary label, an identifier for the cursor CURSOR FOR SELECT t1.c1, t1.c2 FROM table1 t1 WHERE c3 = arg1; -- this fancy spacing is of course not required; all of this could go on the same line. -- a cursor that runs out of data throws an exception; we need to catch this. -- when the NOT FOUND condition fires, "done" -- which defaults to FALSE -- will be set to true, -- and since this is a CONTINUE handler, execution continues with the next statement. DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; -- open the cursor OPEN cursor1; my_loop: -- loops have to have an arbitrary label; it's used to leave the loop LOOP -- read the values from the next row that is available in the cursor FETCH NEXT FROM cursor1 INTO val1, val2; IF done THEN -- this will be true when we are out of rows to read, so we go to the statement after END LOOP. LEAVE my_loop; ELSE -- val1 and val2 will be the next values from c1 and c2 in table t1, -- so now we call the procedure with them for this "row" CALL the_other_procedure(val1,val2); -- maybe do more stuff here END IF; END LOOP; -- execution continues here when LEAVE my_loop is encountered; -- you might have more things you want to do here -- the cursor is implicitly closed when it goes out of scope, or can be explicitly closed if desired CLOSE cursor1; END $$ DELIMITER ;