Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add “updated” to metadata #111

Open
simonw opened this issue Nov 16, 2017 · 14 comments
Open

Add “updated” to metadata #111

simonw opened this issue Nov 16, 2017 · 14 comments
Labels

Comments

@simonw
Copy link
Owner

simonw commented Nov 16, 2017

To give an indication as to when the data was last updated.

This should be a field in the metadata that is then shown on the index page and in the footer, if it is set.

Also support setting it using an option to “datasette publish” and “datasette package” - which can either be a string or can be the magic string “today” to set it to today’s date:

datasette publish file.db --updated=today
@simonw
Copy link
Owner Author

simonw commented Nov 16, 2017

Having this as a global option may not make sense when publishing multiple databases. We can revisit that when we implement per-database and per-table metadata.

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2019

We have per-database and per-table metadata now. I think it's time to make this actually happen.

@simonw simonw self-assigned this Jun 14, 2019
@simonw
Copy link
Owner Author

simonw commented Jun 14, 2019

I think I'll just call it "updated" to avoid the ugly underscore.

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2019

This may be the feature that causes me to add dateutilas a dependency (so I can use dateutil.parser.parse)

@simonw simonw removed their assignment Jun 24, 2019
@simonw
Copy link
Owner Author

simonw commented Sep 23, 2020

This is still a good idea.

@simonw simonw pinned this issue Sep 23, 2020
@simonw simonw unpinned this issue Oct 15, 2020
@simonw
Copy link
Owner Author

simonw commented Dec 4, 2020

This is STILL a good idea.

@simonw
Copy link
Owner Author

simonw commented Sep 20, 2021

Still a good idea today too! Would be great for https://cdc-vaccination-history.datasette.io/ for example.

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2021

I'm going to use https://github.com/dateutil/dateutil for this - it's been maintained constantly (by an evolving team of contributors) since 2003 and is a very trustworthy dependency.

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2021

So this is a metadata key called updated which can be applied at the table, database or instance level. It is represented as a .isoformat() timestamp.

Question: should I support just the date - yyyy-mm-dd - in addition to the datetime?

I think so. I can easily imagine situations where the exact time of day that a change was made hasn't been recorded, but the overall date is known.

But in that case, should the updated key sometimes be yyyy-mm-dd and sometimes be the full isoformat datetime? Or should there be an updated_date key that's used for just the date?

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2021

Side-note: Django 4.0 will switch from using pytz to using the standard library zoneinfo module introduced in Python 3.9, which has a backport that works as far back as 3.6: https://github.com/pganssle/zoneinfo (https://pypi.org/project/backports.zoneinfo/)

If I need to handle timezones I'll use that, but I think I can get away without it?

Django does this: https://github.com/django/django/blob/b0ed619303d2fb723330ca9efa3acf23d49f1d19/setup.cfg#L39-L43

install_requires =
    asgiref >= 3.3.2
    backports.zoneinfo; python_version<"3.9"
    sqlparse >= 0.2.2
    tzdata; sys_platform == 'win32'

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2021

The simplest possible version of this: it's always represented as a UTC ISO datetime, like this:

"updated": "2020-10-31T12:00:00+00:00"

Later versions of Datasette could extend this to handle other timezones or support just the date (though that's a backwards incompatible change so probably better to decide on the date thing right now).

@simonw
Copy link
Owner Author

simonw commented Sep 21, 2021

The audiences I care about here are:

  • Producers of this timestamp - generally that will be users who are using datasette publish to share their data
  • Human consumers of this timestamp - end users who look at a Datasette site and want to know how recent the data is
  • Machine consumers of this timestamp - API integrations that might want to check if a Datasette instance has been updated before downloading new data

For producers I think there are going to be two categories. The first is users who run "publish" and want the site to reflect when they did so (probably using --updated=now when they publish). The second are users who are willing to spend more time thinking about this - for example my various git scraping projects where I want to use a date derived from the git history.

For humans... I'd like to be able to calculate a relative time (3 hours ago) in addition to showing the display time, because that helps avoid confusion over timezones.

For machine consumers, it might be nice to offer the option of a calculated Unix timestamp-since-1970, since those can be easier to work with in some languages than running a full ISO date parser.

@simonw simonw changed the title Add “last_updated” to metadata Add “updated” to metadata Sep 21, 2021
@rcaught
Copy link

rcaught commented Dec 10, 2023

rcaught/querydata.io@a661d43
rcaught/querydata.io@62b8480

Screenshot 2023-12-10 at 9 18 47 pm

I'm going to try and turn this into a proper plugin when I get the chance.

@rcaught
Copy link

rcaught commented Dec 10, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants