-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182
Changes from 5 commits
95fab98
ce926a0
86278cb
58c6ea4
d6eeade
9d12ae1
28eb1df
5e926e8
c1c756d
3a3df0a
3aea9a6
d869a32
b46d21d
81a1ec0
07fe53e
5f04506
84765e4
b9b4098
bf59e61
9bc063a
b0b5099
1396ee8
6f2671a
a9a4179
04a1986
b27212b
aa3e9af
97312fb
6450ba0
29a7f9e
d5294a4
55f7c30
a44390d
b81e292
870d9c7
db94b91
de17d5e
ebce56e
71af717
cf2cfc7
3a62fc1
9fd35ce
2b3474b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -5,19 +5,19 @@ | |||||
import contextlib | ||||||
import os | ||||||
from pathlib import Path | ||||||
from typing import Any, Literal | ||||||
|
||||||
import pandas as pd | ||||||
from packaging.version import Version | ||||||
from pygmt.clib import Session | ||||||
from pygmt.exceptions import GMTInvalidInput | ||||||
from pygmt.helpers import ( | ||||||
GMTTempFile, | ||||||
build_arg_list, | ||||||
data_kind, | ||||||
fmt_docstring, | ||||||
kwargs_to_strings, | ||||||
unique_name, | ||||||
use_alias, | ||||||
validate_output_table_type, | ||||||
) | ||||||
|
||||||
|
||||||
|
@@ -71,7 +71,12 @@ def tempfile_from_dftrack(track, suffix): | |||||
Z="trackvalues", | ||||||
) | ||||||
@kwargs_to_strings(R="sequence") | ||||||
def x2sys_cross(tracks=None, outfile=None, **kwargs): | ||||||
def x2sys_cross( | ||||||
tracks=None, | ||||||
output_type: Literal["pandas", "numpy", "file"] = "pandas", | ||||||
outfile: str | None = None, | ||||||
**kwargs, | ||||||
): | ||||||
r""" | ||||||
Calculate crossovers between track data files. | ||||||
|
||||||
|
@@ -102,11 +107,8 @@ def x2sys_cross(tracks=None, outfile=None, **kwargs): | |||||
set it will default to $GMT_SHAREDIR/x2sys]. (**Note**: MGD77 files | ||||||
will also be looked for via $MGD77_HOME/mgd77_paths.txt and .gmt | ||||||
files will be searched for via $GMT_SHAREDIR/mgg/gmtfile_paths). | ||||||
|
||||||
outfile : str | ||||||
Optional. The file name for the output ASCII txt file to store the | ||||||
table in. | ||||||
|
||||||
{output_type} | ||||||
{outfile} | ||||||
seisman marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
tag : str | ||||||
Specify the x2sys TAG which identifies the attributes of this data | ||||||
type. | ||||||
|
@@ -183,68 +185,56 @@ def x2sys_cross(tracks=None, outfile=None, **kwargs): | |||||
|
||||||
Returns | ||||||
------- | ||||||
crossover_errors : :class:`pandas.DataFrame` or None | ||||||
Table containing crossover error information. | ||||||
Return type depends on whether the ``outfile`` parameter is set: | ||||||
|
||||||
- :class:`pandas.DataFrame` with (x, y, ..., etc) if ``outfile`` is not | ||||||
set | ||||||
- None if ``outfile`` is set (track output will be stored in the set in | ||||||
``outfile``) | ||||||
crossover_errors | ||||||
Table containing crossover error information. Return type depends on ``outfile`` | ||||||
and ``output_type``: | ||||||
|
||||||
- None if ``outfile`` is set (output will be stored in file set by ``outfile``) | ||||||
- :class:`pandas.DataFrame` or :class:`numpy.ndarray` if ``outfile`` is not set | ||||||
(depends on ``output_type``) | ||||||
""" | ||||||
with Session() as lib: | ||||||
file_contexts = [] | ||||||
for track in tracks: | ||||||
kind = data_kind(track) | ||||||
if kind == "file": | ||||||
output_type = validate_output_table_type(output_type, outfile=outfile) | ||||||
|
||||||
file_contexts: list[contextlib.AbstractContextManager[Any]] = [] | ||||||
for track in tracks: | ||||||
match data_kind(track): | ||||||
case "file": | ||||||
file_contexts.append(contextlib.nullcontext(track)) | ||||||
elif kind == "matrix": | ||||||
case "matrix": | ||||||
# find suffix (-E) of trackfiles used (e.g. xyz, csv, etc) from | ||||||
# $X2SYS_HOME/TAGNAME/TAGNAME.tag file | ||||||
lastline = ( | ||||||
Path(os.environ["X2SYS_HOME"], kwargs["T"], f"{kwargs['T']}.tag") | ||||||
.read_text(encoding="utf8") | ||||||
.strip() | ||||||
.split("\n")[-1] | ||||||
) # e.g. "-Dxyz -Etsv -I1/1" | ||||||
tagfile = Path( | ||||||
os.environ["X2SYS_HOME"], kwargs["T"], f"{kwargs['T']}.tag" | ||||||
) | ||||||
# Last line is like "-Dxyz -Etsv -I1/1" | ||||||
lastline = tagfile.read_text().splitlines()[-1] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added back. |
||||||
for item in sorted(lastline.split()): # sort list alphabetically | ||||||
if item.startswith(("-E", "-D")): # prefer -Etsv over -Dxyz | ||||||
suffix = item[2:] # e.g. tsv (1st choice) or xyz (2nd choice) | ||||||
|
||||||
# Save pandas.DataFrame track data to temporary file | ||||||
file_contexts.append(tempfile_from_dftrack(track=track, suffix=suffix)) | ||||||
else: | ||||||
case _: | ||||||
raise GMTInvalidInput(f"Unrecognized data type: {type(track)}") | ||||||
|
||||||
with GMTTempFile(suffix=".txt") as tmpfile: | ||||||
with Session() as lib: | ||||||
with lib.virtualfile_out(kind="dataset", fname=outfile) as vouttbl: | ||||||
with contextlib.ExitStack() as stack: | ||||||
fnames = [stack.enter_context(c) for c in file_contexts] | ||||||
if outfile is None: | ||||||
outfile = tmpfile.name | ||||||
lib.call_module( | ||||||
module="x2sys_cross", | ||||||
args=build_arg_list(kwargs, infile=fnames, outfile=outfile), | ||||||
) | ||||||
|
||||||
# Read temporary csv output to a pandas table | ||||||
if outfile == tmpfile.name: # if outfile isn't set, return pd.DataFrame | ||||||
# Read the tab-separated ASCII table | ||||||
date_format_kwarg = ( | ||||||
{"date_format": "ISO8601"} | ||||||
if Version(pd.__version__) >= Version("2.0.0") | ||||||
else {} | ||||||
args=build_arg_list(kwargs, infile=fnames, outfile=vouttbl), | ||||||
) | ||||||
table = pd.read_csv( | ||||||
tmpfile.name, | ||||||
sep="\t", | ||||||
header=2, # Column names are on 2nd row | ||||||
comment=">", # Skip the 3rd row with a ">" | ||||||
parse_dates=[2, 3], # Datetimes on 3rd and 4th column | ||||||
**date_format_kwarg, # Parse dates in ISO8601 format on pandas>=2 | ||||||
result = lib.virtualfile_to_dataset( | ||||||
vfname=vouttbl, output_type=output_type, header=2 | ||||||
) | ||||||
Comment on lines
-241
to
225
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. The main problem is that, as far as I know, there is no equivalent way to represent a multi-segment file in pandas. The multi-segment support was also mentioned in #2729 (comment). If we can have a general way to represent multi-segment in pandas, then it should be straightforward to output multi-segments from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's already tested in the pygmt/pygmt/datatypes/dataset.py Line 215 in 466c8b6
For |
||||||
# Remove the "# " from "# x" in the first column | ||||||
table = table.rename(columns={table.columns[0]: table.columns[0][2:]}) | ||||||
elif outfile != tmpfile.name: # if outfile is set, output in outfile only | ||||||
table = None | ||||||
|
||||||
return table | ||||||
# Convert 3rd and 4th columns to datetimes. | ||||||
# These two columns have names "t_1"/"t_2" or "i_1"/"i_2". | ||||||
# "t_1"/"t_2" means they are datetimes and should be converted. | ||||||
# "i_1"/"i_2" means they are dummy times (i.e., floating-point values). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Am I understanding the output correctly? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've never used x2sys, but here is my understanding of the C codes and the output:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little unsure if
It seems like the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dummy times are just double-precision indexes from 0 to n (xref: https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys.c#L533). The column name We can keep the dummy times as double-precision numbers or think them as seconds since unix epoch and then convert them to absolute times. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe convert the relative time to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. Done in 9d12ae1. |
||||||
if output_type == "pandas" and result.columns[2] == "t_1": | ||||||
result[result.columns[2:4]] = result[result.columns[2:4]].apply( | ||||||
pd.to_datetime, unit="s" | ||||||
) | ||||||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not sure if we should support
numpy
output type forx2sys_cross
because all 'columns' will need to be the same dtype in anp.ndarray
. If there are datetime values in the columns, they will get converted to floating point (?), which makes it more difficult to use later. Try adding a unit test fornumpy
output_type and see if it makes sense.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Datetimes are converted to floating points by
df.to_numpy()
. Will remove thenumpy
output type.