From: Rene Veerman on
Hi.

I'm looking for the most efficient way to insert several records and
retrieve the auto_increment values for the inserted rows, while
avoiding crippling concurrency problems caused by multiple php threads
doing this on the same table at potentially the same time.

I'm using mysql atm, so i thought "stored procedures!"..
But alas, mysql docs are very basic.

I got the gist of how to setup a stored proc, but how to retrieve a
list of auto_increment ids still eludes me; last_insert_id() only
returns for the last row i believe.
So building an INSERT (...) VALUES (...),(...) at the php end, is
probably not the way to go then.

But the mysql docs don't show how to pass an array to a stored
procedure, so i can't just have the stored proc loop over an array,
insert per row, retrieve last_insert_id() into temp table, and return
the temp table contents for a list of auto_increment ids for inserted
rows.

Any clues are greatly appreciated..
I'm looking for the most sql server independent way to do this.
From: Eric Lee on
On Sat, Feb 13, 2010 at 2:07 PM, Rene Veerman <rene7705(a)gmail.com> wrote:

> Hi.
>
> I'm looking for the most efficient way to insert several records and
> retrieve the auto_increment values for the inserted rows, while
> avoiding crippling concurrency problems caused by multiple php threads
> doing this on the same table at potentially the same time.
>
> I'm using mysql atm, so i thought "stored procedures!"..
> But alas, mysql docs are very basic.
>
> I got the gist of how to setup a stored proc, but how to retrieve a
> list of auto_increment ids still eludes me; last_insert_id() only
> returns for the last row i believe.
> So building an INSERT (...) VALUES (...),(...) at the php end, is
> probably not the way to go then.
>
> But the mysql docs don't show how to pass an array to a stored
> procedure, so i can't just have the stored proc loop over an array,
> insert per row, retrieve last_insert_id() into temp table, and return
> the temp table contents for a list of auto_increment ids for inserted
> rows.
>
> Any clues are greatly appreciated..
> I'm looking for the most sql server independent way to do this.
>
>
Rene

I have not been worked with mysql multi-insert before.
But just did a simple test on my mysql 5.0 copy.

I assume that you are using MyISAM table and will lock its read, writel
when inserting data.

When multi-insert was done, and did a select last_insert_id(). I saw that
only
the first inserted id was returned. Please take a look the following steps:


mysql> select * from temp;
Empty set (0.00 sec)

mysql> insert into temp (firstname, price) values ('dd', 10), ('cc', 3),
('bb', 99);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
| 1 |
+------------------+
1 row in set (0.00 sec)

mysql> insert into temp (firstname, price) values ('dd', 10), ('cc', 3),
('bb', 99);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
| 4 |
+------------------+
1 row in set (0.00 sec)


So, let's say three records was inserted, and the first inserted id was 1.
You get id from 1 to 3.

! This will not work on transaction-based insert !

Just a thought and tested on mysql but not on php.



Regards,
Eric

> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
From: Eric Lee on
On Sat, Feb 13, 2010 at 6:55 PM, Jochem Maas <jochem(a)iamjochem.com> wrote:

> Op 2/13/10 10:08 AM, Lester Caine schreef:
> > Rene Veerman wrote:
> >> Hi.
> >>
> >> I'm looking for the most efficient way to insert several records and
> >> retrieve the auto_increment values for the inserted rows, while
> >> avoiding crippling concurrency problems caused by multiple php threads
> >> doing this on the same table at potentially the same time.
> >
> >> Any clues are greatly appreciated..
> >> I'm looking for the most sql server independent way to do this.
> >
> > Rene
> > The 'correct' way of doing this is to use a 'sequence' which is
> > something introduced in newer versions of the SQL standard.
> > Firebird(Interbase) has had 'generators' since the early days (20+
> > years) and these provide a unique number which can then be inserted into
> > the table.
> >
> > ADOdb emulates sequences in MySQL by creating a separate table for the
> > insert value, so you can get the next value and work with it, without
> > any worries. The only 'problem' is in situations were an insert is
> > rolled back, a number is lost, but that is ACTUALLY the correct result,
> > since there is no way of knowing that a previous insert WILL commit when
> > several people are adding records in parallel.
>
> this is all true and correct ...
>
> but that doesn't answer the problem. how do you get the IDs of all the
> records
> that we're actually inserted in a multi-insert statement, even if you
> generate the
> IDs beforehand you have to check them to see if any one of the set INSERT
> VALUEs failed.
>
> @Rene:
>
> I don't think there is a really simple way of doing this in a RDBMS
> agnostic
> way, each RDBMS has it's own implementation - although many are alike ...
> and MySQL is
> pretty much the odd one out in that respect.
>
> it might require a reevaluation of the problem, to either determine that
> inserting
> several records at once is not actually important in terms of performance
> (this would depend
> on how critical the speed is to you and exactly how many records you're
> likely to be inserting
> in a given run) and whether you can rework the logic to do away with the
> requirement to
> get at the id's of the newly inserted records ... possibly by indentifying
> a unique
> indentifier in the data that you already have.
>
> one way to get round the issue might be to use a generated GUID and have an
> extra field which
> you populate with that value for all records inserted with a single query,
> as such it could
> function as kind of transaction indentifier which you could use to retrieve
> the newly
> inserted id's with one extra query:
>
> $sql = "SELECT id FROM foo WHERE insert_id = '{$insertGUID}'";
>
> ... just an idea.
>
> >
>
>
>
Hi

I would like to learn more correct way from both of you.
May I ask what is a sequences ?


Thanks !


Regards,
Eric

> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
From: Rene Veerman on
On Sat, Feb 13, 2010 at 3:46 PM, Joseph Thayne <webadmin(a)thaynefam.org> wrote:
> In order to make this as "sql server independent" as possible, the first
> thing you need to do is not use extended inserts as that is a MySQL
> capability.  If you are insistent on using the extended inserts, then look
> at the mysql_info() function.  That will return the number of rows inserted,
> etc. on the last query.
>

But as previous posters had pointed out (thanks) i can't see which rows failed.
As i'm dealing with 3rd-party data, that's an issue.

I also didn't know it was mysql-specific, that multi-insert..

And i tried looking up the sql-standard docs, only to find that they
cost over 200 euro per
part (14 parts).
I've sent angry emails to ansi.org and iso.org (commercial lamers
operating under .org, yuck), about how cool a business model that
charges a percentage of profits per implementation would be, instead
of charging high prices up-front for a potentially bad/complicated
piece of spec.

But back to the problem at hand; it looks like i'll have to forget
about using 100s of threads for my newsscraper at the same time, and
settle for a few dozen instead.
Then i can just do single inserts (per hit) and retrieve the last_insert_id().

One question remains: it is probably not (concurrently-)safe to do a
sql-insert from php and then a last_insert_id() also from php..?
I still have to build a stored procedure to do-the-inserting and
return the last_insert_id()?
From: Larry Garfield on
On Sunday 14 February 2010 03:15:16 am Rene Veerman wrote:
> On Sat, Feb 13, 2010 at 3:46 PM, Joseph Thayne <webadmin(a)thaynefam.org>
wrote:
> > In order to make this as "sql server independent" as possible, the first
> > thing you need to do is not use extended inserts as that is a MySQL
> > capability. If you are insistent on using the extended inserts, then
> > look at the mysql_info() function. That will return the number of rows
> > inserted, etc. on the last query.
>
> But as previous posters had pointed out (thanks) i can't see which rows
> failed. As i'm dealing with 3rd-party data, that's an issue.
>
> I also didn't know it was mysql-specific, that multi-insert..
>
> And i tried looking up the sql-standard docs, only to find that they
> cost over 200 euro per
> part (14 parts).
> I've sent angry emails to ansi.org and iso.org (commercial lamers
> operating under .org, yuck), about how cool a business model that
> charges a percentage of profits per implementation would be, instead
> of charging high prices up-front for a potentially bad/complicated
> piece of spec.
>
> But back to the problem at hand; it looks like i'll have to forget
> about using 100s of threads for my newsscraper at the same time, and
> settle for a few dozen instead.
> Then i can just do single inserts (per hit) and retrieve the
> last_insert_id().
>
> One question remains: it is probably not (concurrently-)safe to do a
> sql-insert from php and then a last_insert_id() also from php..?
> I still have to build a stored procedure to do-the-inserting and
> return the last_insert_id()?

That's perfectly safe to do as long as it's within the same PHP request.
(Well, the same DB connection, really, which is 99% of the time the same
thing.) last_insert_id() is connection-specific.

I believe (it's been a while since I checked) the MySQL documentation says
that last_insert_id() with a multi-insert statement is not reliable and you
shouldn't rely on it having a worthwhile meaning anyway. Or at least it said
something that made me conclude that it's safest to assume it's unreliable for
a multi-insert statement.

If you're concerned about performance of that many bulk writes, there's 3
things you can do to help:

1) Use InnoDB. It uses row-level locking so lots of writes doesn't lock your
whole table as in MyISAM tables.

2) Disable indexes on the table in question before running your bulk insert,
then re-enable them. That's considerably faster than rebuilding the index
after each and every insert as they only need to be rebuilt once.

3) If you're on InnoDB, using transactions can sometimes give you a
performance boost because the writes hit disk all at once when you commit.
There may be other side effects and trade offs here, though, so take with a
grain of salt.

--Larry Garfield