Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing a feed with quotes... #36

Open
ukadiyala opened this issue Jul 26, 2016 · 10 comments
Open

Parsing a feed with quotes... #36

ukadiyala opened this issue Jul 26, 2016 · 10 comments

Comments

@ukadiyala
Copy link

Hi There,

I'm trying to parse a feed which utilises quotes ("") in addition to commas in every file. Would you have an example of how I could configure the reader to discard the quotes?

All files seem to be parsing except the calendar file. Here is a sample of what it looks like:

service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
FULLW,1,1,1,1,1,1,1,20160714,20161014
WE,0,0,0,0,0,1,1,20160714,20161014
"Z1+1","1","1","1","1","1","1","1","20160714","20161014"

The first two lines parse as they are from the test file. The last line which is from the feed does not and I get an error message saying:
"Could not parse value "20161014" in field end_date in file calendar.".

I'm pretty sure, there is a configuration item I'm missing. This also makes me wonder about the rest of the data. Can you please help?

Regards,
Udhay

@xivk
Copy link
Contributor

xivk commented Jul 27, 2016

Can you provide a sample feed or build a unittest that simulates this? Will make it a lot easier to track down this issue...

@ukadiyala
Copy link
Author

ukadiyala commented Jul 27, 2016

Hi There... Thank you for the prompt response... Attached is a sample .zip extract from a wider set I'm working with... I also noticed the same problem with the calendar_dates file...

Thanks and look forward to hearing from you soon...

Sample.zip

@ukadiyala
Copy link
Author

Hi there... I'm hoping that you have managed to open the sample files and reproduce the issue I'm facing... any recommendations from your end?

@ukadiyala
Copy link
Author

ukadiyala commented Aug 1, 2016

Hi There,

Hope all is well on your end. I have not heard back from you. So, I had taken the liberty to replicate your source code on my machine and stepped through it.

The problem seems to in the MoveNext() method of the CSVStreamReader class. Upon looking further into it, the 'line' variable carrying the new line seems to have an additional '' after the ".

I was wondering if I could configure it to use a line pre-processor to solve this problem. Your thoughts?

Regards,
Udhay

@ukadiyala
Copy link
Author

ukadiyala commented Aug 1, 2016

Done... Solved it with a line pre-processor delegate...

used the code below to configure it...
reader.LinePreprocessor = delegate (string s) { return s.Replace(""", ""); };

Also, was wondering if you have any samples handy of the invalid feeds you are accounting for in the MoveNext() method... I wonder if looping around might be the best thing to do in a portable class library...

Do let me know...

@xivk
Copy link
Contributor

xivk commented Aug 2, 2016

Sorry, maintaining this in my spare time so I didn't have to to check this. Are you using mono on OSX/Linux or .NET on Windows?

@ukadiyala
Copy link
Author

ukadiyala commented Aug 2, 2016

No worries... We are all busy people... I understand...
You seem to have written a good library here... Happy to contribute...
I'm utilising .Net on Windows...

@ukadiyala
Copy link
Author

ukadiyala commented Aug 5, 2016

NOT Fixed... Damn it!!! Still an issue. I have done a full circle and come back to the beginning...

@xivk
Copy link
Contributor

xivk commented Aug 5, 2016

As I said, doing this in my spare time and it can take a while... you an always submit a pull request I can create a new build/package for you...

@simon-meer
Copy link

simon-meer commented Oct 5, 2017

Just stumbled upon the same issue myself -- looks like CSVStreamReader doesn't correctly handle the case when the last item is in quotes.

https://github.com/itinero/GTFS/blob/develop/src/GTFS/IO/CSV/CSVStreamReader.cs#L155

Apparently, the last item is taken "as-is" (substring starting from the position after the last comma to the end) as opposed to the rest, which is correctly checked for quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants