Skip to content

StressAsync fails #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tofutim opened this issue Nov 11, 2015 · 18 comments
Closed

StressAsync fails #244

tofutim opened this issue Nov 11, 2015 · 18 comments

Comments

@tofutim
Copy link

tofutim commented Nov 11, 2015

I get about 400 errors in NCrunch (from sqlite being 'busy') and 17-18 in appveyor

@tofutim
Copy link
Author

tofutim commented Nov 11, 2015

How do we get the stress test to pass? Is this a bug with sqlite3 in Windows?

@oysteinkrog
Copy link
Owner

This test has been buggy for a long time, I've tried to fix it but I have not yet been able to.
To be honest I don't really understand why it fails.

@tofutim
Copy link
Author

tofutim commented Nov 11, 2015

When I debug in VS I keep running into 'busy' and 'database locked'. In
NCrunch I get like 400 of these errors, and the test fails because it asks
for 0 errors, and I have 400. In appveyor somehow there are only 15-40
errors, but it still stops in the same place.

In an earlier discussion on Generic and Mac, someone mentioned that their
test was failing at 499 compared to 500. This means they did not hit any
errors.

On Wed, Nov 11, 2015 at 2:31 PM, Øystein Krog [email protected]
wrote:

This test has been buggy for a long time, I've tried to fix it but I have
not yet been able to.
To be honest I don't really understand why it fails.


Reply to this email directly or view it on GitHub
#244 (comment)
.

@sjlombardo
Copy link

Since this test is running a large number of concurrent operations I would suggest two changes:

  1. After opening the connection set PRAGMA busy_timeout to a reasonable value, perhaps start with 1000 and see how that works. This will allow a given connection to block for a short period of time instead of immediately returning an error if the database is locked by another operation.
  2. Use WAL mode for the test database. It will drastically improve the ability of operations to run concurrently; with WAL mode readers will not block writers and writers will not block readers.

These two changes may resolve the errors you are seeing.

@tofutim
Copy link
Author

tofutim commented Nov 12, 2015

Good suggestions, thanks!

On Thu, Nov 12, 2015 at 6:29 AM, Stephen Lombardo [email protected]
wrote:

Since this test is running a large number of concurrent operations I would
suggest two changes:

After opening the connection set PRAGMA busy_timeout
https://www.sqlite.org/pragma.html#pragma_busy_timeout to a
reasonable value, perhaps start with 1000 and see how that works. This will
allow a given connection to block for a short period of time instead of
immediately returning an error if the database is locked by another
operation.
2.

Use WAL mode https://www.sqlite.org/wal.html for the test database.
It will drastically improve the ability of operations to run concurrently;
with WAL mode readers will not block writers and writers will not block
readers.

These two changes may resolve the errors you are seeing.


Reply to this email directly or view it on GitHub
#244 (comment)
.

@tofutim
Copy link
Author

tofutim commented Nov 12, 2015

@sjlombardo I tried your suggestion. Interestingly, journal_mode = delete (default) and journal_mode = wal have characteristically different errors. Here are the errors for delete and wal for a sample for busy timeout of 100. I changed the test to 100 items because 500 was taking too long. wal is better than delete, but increasing the timeout for wal does not seem to improve it. (Not sure why not, since it is busy).

journal_mode: wal
busy_timeout: 100
0 Busy
1 Busy
2 Busy
3 Busy
4 Busy
5 Busy
6 Busy
7 Busy
8 Busy
9 Busy
10 Busy
11 Busy
12 Busy
13 Busy
14 Busy
15 Busy
16 Busy
17 Busy
18 Busy
19 Busy
20 Busy
21 Busy
22 Busy
23 Busy
24 Busy
25 Busy
26 Busy
27 Busy
28 Busy
29 Busy
30 Busy
31 Busy
32 Busy
33 Busy
  Expected: 0
  But was:  34

and

journal_mode: delete
busy_timeout: 100
0 database is locked
1 database is locked
2 database is locked
3 database is locked
4 Failed query
5 database is locked
6 Busy
7 Failed query
8 database is locked
9 database is locked
10 database is locked
11 database is locked
12 database is locked
13 database is locked
14 Busy
15 Busy
16 database is locked
17 database is locked
18 database is locked
19 database is locked
20 database is locked
21 database is locked
22 database is locked
23 database is locked
24 database is locked
25 database is locked
26 Busy
27 database is locked
28 database is locked
29 database is locked
30 Failed query
31 database is locked
32 database is locked
33 Busy
34 database is locked
35 Busy
36 database is locked
37 database is locked
38 database is locked
39 database is locked
40 database is locked
41 database is locked
42 database is locked
43 Busy
44 database is locked
45 database is locked
46 database is locked
47 Busy
48 Failed query
49 database is locked
50 database is locked
51 Failed query
52 database is locked
53 database is locked
54 database is locked
55 Failed query
56 database is locked
57 database is locked
58 database is locked
59 database is locked
60 Busy
61 Busy
62 Busy
63 database is locked
64 Busy
  Expected: 0
  But was:  65

@tofutim
Copy link
Author

tofutim commented Nov 12, 2015

I just discovered that the busy_timeout is reset in new connections. I probably have to set it longer for each of the threads. That is a little scary.

@tofutim
Copy link
Author

tofutim commented Nov 12, 2015

Sadly, even setting busy_timeout for each of the runs did not change the Busy error. I changed it from 100 to 1000 with no real change in Busy errors, which is about 33%.

@tofutim
Copy link
Author

tofutim commented Nov 12, 2015

http://beets.radbox.org/blog/sqlite-nightmare.html was interesting, i wonder if we are facing that in the test

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

The Busy signal always comes in INSERT rather than when it is selecting from the table.

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

I was able to make progress by adding this to PreparedSqlLiteInsertCommand

        Result r;
        lock (_locker)
        {
            r = sqlitePlatform.SQLiteApi.Step(Statement);
        }

Now there are no Busy errors, but the final count is 94 instead of 100. Locked using

    static readonly object _locker = new object();

But does this mean we need this everywhere? Isn't SQLite supposed to handle this. :P

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

Ok, with the locker code in place, the new error is that there are only 94 / 100 when the count is asked for. However, if you Task.Delay(1000) before reading the count, everything is actually in there. This means the Task completes before the task is really completed. Not sure how that is possible, but maybe each connection has a separate cache of some sort.

      await Task.Delay(1000);
        int count = await globalConn.Table<Customer>().CountAsync();
        Assert.AreEqual(n, count, "Not enough items in table");

I experimented with some values and 100-200 usually works. Maybe we should be using the same connection instead of opening a lot of new ones? I don't know.

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

Once I enter 'wal' I can't back out of it at the end of the test.

        await globalConn.ExecuteScalarAsync<string>("PRAGMA journal_mode = delete"); // default

at the end gives me 'database is locked' ;(

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

Why is

    lock (_locker)
    {
        r = sqlitePlatform.SQLiteApi.Step(Statement);
    }

needed when it should already be locked by SQLiteConnectionWithLock

Actually, and this is bad because the _locker is static so it would prevent writing to different databases

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

If I change the task connection to the global connection, it does not negate the need for the Task.Delay - it feels like the WhenAll is not really waiting for all the tasks to complete.

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

Sorry for this long thread - I wonder if anyone will ever read it. await Task.WhenAll(tasks) does not really do what we think it does... check this out. It says 100 tasks are completed (end of the WhenAll) but there are actually 6 more going.

journal_mode: wal
synchronous: 2
page_size: 1024
busy_timeout: 100
task 99 with id 1 and name 99
task 0 with id 2 and name 0
task 1 with id 3 and name 1
task 2 with id 4 and name 2
task 98 with id 5 and name 98
task 3 with id 6 and name 3
task 4 with id 7 and name 4
task 5 with id 8 and name 5
task 6 with id 9 and name 6
task 97 with id 10 and name 97
task 7 with id 11 and name 7
task 8 with id 12 and name 8
task 9 with id 13 and name 9
task 10 with id 14 and name 10
task 96 with id 15 and name 96
task 11 with id 16 and name 11
task 12 with id 17 and name 12
task 14 with id 19 and name 14
task 13 with id 18 and name 13
task 95 with id 20 and name 95
task 15 with id 21 and name 15
task 16 with id 22 and name 16
task 17 with id 23 and name 17
task 94 with id 24 and name 94
task 18 with id 25 and name 18
task 19 with id 26 and name 19
task 20 with id 27 and name 20
task 93 with id 28 and name 93
task 21 with id 29 and name 21
task 22 with id 30 and name 22
task 23 with id 31 and name 23
task 92 with id 32 and name 92
task 24 with id 33 and name 24
task 25 with id 34 and name 25
task 26 with id 35 and name 26
task 27 with id 36 and name 27
task 91 with id 37 and name 91
task 28 with id 38 and name 28
task 29 with id 39 and name 29
task 30 with id 40 and name 30
task 31 with id 41 and name 31
task 90 with id 42 and name 90
task 32 with id 43 and name 32
task 34 with id 45 and name 34
task 35 with id 46 and name 35
task 89 with id 47 and name 89
task 36 with id 48 and name 36
task 37 with id 49 and name 37
task 38 with id 50 and name 38
task 88 with id 51 and name 88
task 39 with id 52 and name 39
task 40 with id 53 and name 40
task 41 with id 54 and name 41
task 87 with id 55 and name 87
task 42 with id 56 and name 42
task 43 with id 57 and name 43
task 44 with id 58 and name 44
task 33 with id 44 and name 33
task 86 with id 59 and name 86
task 45 with id 60 and name 45
task 46 with id 61 and name 46
task 49 with id 62 and name 49
task 47 with id 63 and name 47
task 48 with id 64 and name 48
task 51 with id 65 and name 51
task 52 with id 66 and name 52
task 50 with id 67 and name 50
task 85 with id 68 and name 85
task 53 with id 69 and name 53
task 54 with id 70 and name 54
task 55 with id 71 and name 55
task 56 with id 72 and name 56
task 57 with id 73 and name 57
task 58 with id 74 and name 58
task 59 with id 75 and name 59
task 60 with id 76 and name 60
task 84 with id 77 and name 84
task 61 with id 78 and name 61
task 62 with id 79 and name 62
task 63 with id 80 and name 63
task 64 with id 81 and name 64
task 65 with id 82 and name 65
task 70 with id 88 and name 70
task 71 with id 89 and name 71
task 67 with id 84 and name 67
task 69 with id 87 and name 69
task 72 with id 90 and name 72
task 73 with id 91 and name 73
task 74 with id 92 and name 74
task 66 with id 83 and name 66
task 75 with id 93 and name 75
task 76 with id 94 and name 76
task 77 with id 95 and name 77
task 78 with id 96 and name 78
100 tasks completed
Start waiting...
task 79 with id 97 and name 79
task 83 with id 86 and name 83
task 68 with id 85 and name 68
task 80 with id 98 and name 80
task 81 with id 99 and name 81
task 82 with id 100 and name 82
Finish waiting...

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

It turns out that Task.Factory.StartNew does not work well with Task.WhenAll. We should be using Task.Run. When this is done, voila, no need for a Delay.

...
task 80 with id 98 and name 80
task 78 with id 99 and name 78
task 79 with id 100 and name 79
100 tasks completed
Start waiting...
Finish waiting...

@tofutim
Copy link
Author

tofutim commented Nov 13, 2015

http://stackoverflow.com/questions/18873069/task-whenall-have-finished-before-the-tasks-it-contains-were-finished

"As commented by Svick, Task.Factory.StartNew has different overloads and functionality to that of Task.Run these differences have been blogged about here: Task.Run vs Task.Factory.StartNew. The result of Task.Factory.StartNew(async () => ... is a Task, so when calling Task.WhenAll you are only waiting for the outer Task to complete."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants