Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings encased in quotes are not properly unescaped/trimmed #79

Open
domints opened this issue Sep 6, 2024 · 0 comments
Open

Strings encased in quotes are not properly unescaped/trimmed #79

domints opened this issue Sep 6, 2024 · 0 comments

Comments

@domints
Copy link

domints commented Sep 6, 2024

Example feed: https://otwartedane.metropoliagzm.pl/dataset/rozklady-jazdy-i-lokalizacja-przystankow-gtfs-wersja-rozszerzona/resource/290298ce-944b-4744-8f92-29ab2b786a33

Essentially CSV deserializer is not properly treating strings that are encased in quotation marks ("). I saw that was a problem with colors in version 1.7, in this 3.0 beta colors are fine, but now it's a problem with block_id field.
Yes, but maybe it doesn't always make sense to have quotation marks within ID, well, it's an ID, but also GTFS docs say:

ID - An ID field value is an internal ID, not intended to be shown to riders, and is a sequence of any UTF-8 characters.
Using only printable ASCII characters is recommended.

So it technically can contain it. Also, Busman, scheduling system widely used in Poland seems to encase any string in quotation marks, which breaks this lib.

I'd suggest treating any string-like field as a string, and if it's enclosed in quotation mark handle it properly. Doesn't this lib have reference to any well known, well tested CSV deserialization library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant