Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters fail to work correctly against calculated numeric columns returned by SQL views because type affinity rules do not apply #1671

Open
rayvoelker opened this issue Mar 20, 2022 · 8 comments
Labels

Comments

@rayvoelker
Copy link

I found a strange behavior, and I'm not sure if it's related to views and boolean values perhaps, or if there's something else weird going on here, but I'll provide an example that may help show what I'm seeing happen.

#!/bin/bash

echo "\"id\",\"expiration_date\"
0,2018-01-04
1,2019-01-05
2,2020-01-06
3,2021-01-07
4,2022-01-08
5,2023-01-09
6,2024-01-10
7,2025-01-11
8,2026-01-12
9,2027-01-13
" > test.csv
csvs-to-sqlite test.csv test.db
sqlite-utils create-view --replace test.db test_view "select id, expiration_date, case when julianday('NOW') >= julianday(expiration_date) then 1 else 0 end as has_expired FROM test"
datasette test.db

image

image

image

image

Thanks again and let me know if you want me to provide anything else!

@simonw
Copy link
Owner

simonw commented Mar 21, 2022

Oh this is fascinating! I replicated the bug (thanks for the steps to reproduce) and it looks like this is down to the following:

image

Against views, where has_expired = 1 returns different results from where has_expired = '1'

This doesn't happen against tables because of SQLite's type affinity mechanism, which handles the type conversion automatically.

@simonw simonw added the bug label Mar 21, 2022
@simonw simonw changed the title Error with facets or parameterized filters on Boolean values ... Filters fail to work correctly against calculated numeric columns returned by SQL views because type affinity rules do not apply Mar 21, 2022
@simonw
Copy link
Owner

simonw commented Mar 21, 2022

Relevant section of the SQLite documentation: 3.2. Affinity Of Expressions:

When an expression is a simple reference to a column of a real table (not a VIEW or subquery) then the expression has the same affinity as the table column.

In your example, has_expired is no longer a simple reference to a column of a real table, hence the bug.

Then 4.2. Type Conversions Prior To Comparison fills in the rest:

SQLite may attempt to convert values between the storage classes INTEGER, REAL, and/or TEXT before performing a comparison. Whether or not any conversions are attempted before the comparison takes place depends on the type affinity of the operands.

@simonw
Copy link
Owner

simonw commented Mar 21, 2022

I wonder if this will be a problem with generated columns, or with SQLite strict tables?

My hunch is that strict tables will continue to work without any changes, because https://www.sqlite.org/stricttables.html says nothing about their impact on comparison operations. I should test this to make absolutely sure though.

Generated columns have a type, so my hunch is they will continue to work fine too.

@simonw
Copy link
Owner

simonw commented Mar 21, 2022

Thinking about options for fixing this...

The following query works fine:

select * from test_view where cast(has_expired as text) = '1'

I don't want to start using this for every query, because one of the goals of Datasette is to help people who are learning SQL:

If someone clicks on "View and edit SQL" from a filtered table page I don't want them to have to wonder why that cast is there.

But... for querying views, the cast turns out to be necessary.

So one fix would be to get the SQL generating logic to use casts like this any time it is operating against a view.

An even better fix would be to detect which columns in a view come from a table and which ones might not, and only use casts for the columns that aren't definitely from a table.

The trick I was exploring here might be able to help with that:

@simonw
Copy link
Owner

simonw commented Mar 22, 2022

The alternative to using cast here would be for Datasette to convert the "1" to a 1 in Python code before passing it as a param.

This feels a bit neater to me, but I still then need to solve the problem of how to identify the "type" of a column that I want to use in a query.

@simonw
Copy link
Owner

simonw commented Mar 22, 2022

No, I think I need to use cast - I can't think of any way to ask SQLite "for this query, what types are the columns that will come back from it?"

Even the details from the explain trick explored in #1293 don't seem to come back with column type information: https://latest.datasette.io/fixtures?sql=explain+select+pk%2C+text1%2C+text2%2C+[name+with+.+and+spaces]+from+searchable_view+where+%22pk%22+%3D+%3Ap0&p0=1

@simonw
Copy link
Owner

simonw commented Mar 22, 2022

Note that Datasette does already have special logic to convert parameters to integers for numeric comparisons like >:

def where_clause(self, table, column, value, param_counter):
converted = self.format.format(value)
if self.numeric and converted.isdigit():
converted = int(converted)
if self.no_argument:
kwargs = {"c": column}
converted = None
else:
kwargs = {"c": column, "p": f"p{param_counter}", "t": table}
return self.sql_template.format(**kwargs), converted

Though... it looks like there's a bug in that? It doesn't account for float values - "3.5".isdigit() return False - probably for the best, because int(3.5) would break that value anyway.

@simonw
Copy link
Owner

simonw commented Mar 22, 2022

Also made me realize that this query:

select * from sortable where sortable > :p0

Only works here thanks to the column affinity thing kicking in too: https://latest.datasette.io/fixtures?sql=select+*+from+sortable+where+sortable+%3E+%3Ap0&p0=70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants