Skip to content

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
  • Loading branch information
beckyperriment authored Apr 25, 2024
1 parent d03081a commit 90c6aed
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ The DTW distance is the sum of the Euclidean distance between each point and its
2. Only unidirectional forward movement through relative time is allowed, i.e., if $x_1$ is mapped to $y_2$ then $x_2$ may not be mapped to $y_1$ (monotonicity).
3. Each point is mapped to at least one other point, i.e., there are no jumps in time (continuity).

![(a) Two time series with DTW pairwise alignment between each point, showing the one-to-many mapping properties of DTW. (b) Cost matrix $C$ for the two time series, showing the warping path and final DTW cost at $C_{14,13}$. \label{fig:warping_signals}](../media/warping_merged_cropped.pdf)
![(a) Two time series with DTW pairwise alignment between each point, showing the one-to-many mapping properties of DTW. (b) Cost matrix $C$ for the two time series, showing the warping path and final DTW cost at $c_{14,13}$. \label{fig:warping_signals}](../media/warping_merged_cropped.pdf)

Finding the optimal warping arrangement is an optimisation problem that can be solved using dynamic programming, which splits the problem into easier sub-problems and solves them recursively, storing intermediate solutions until the final solution is reached. To understand the memory-efficient method used in ``DTW-C++``, it is useful to first examine the full cost matrix solution, as follows. For each pairwise comparison, an $n$ by $m$ matrix $C^{n\times m}$ is calculated, where each element represents the cumulative cost between series up to the points $x_i$ and $y_j$:

Expand Down Expand Up @@ -169,7 +169,7 @@ UWaveGestureLibraryAll & 3582 & 945 & N/A & \textbf{1194.6} & 443
\end{table}


As can be seen in these results, \texttt{DTW-C++} is the fastest package for 90\% of the datasets, and all 13 datasets where \texttt{DTAIDistance} was faster were cases where the entire clustering process was completed in 1.06 seconds or less. Across the whole collection of datasets, \texttt{DTW-C++} was on average 32% faster. When looking at larger datasets, with $N > 1000$, \texttt{DTW-C++} is on average 65% faster. In all, apart from 2 of the 115 cases where \texttt{DTW-C++} is the fastest, we used the k-medoids algorithm for clustering. \autoref{fig:k_med} shows the increasing performance of \texttt{DTW-C++} as the number of time series increases. In this comparison, both algorithms used k-medoids, so the speed improvement is due to faster dynamic time warping method in \texttt{DTW-C++}.
\texttt{DTW-C++} is the fastest package for 90\% of the datasets, and all 13 datasets where \texttt{DTAIDistance} was faster were cases where the entire clustering process was completed in 1.06 seconds or less. Across the whole collection of datasets, \texttt{DTW-C++} was on average 32% faster. When looking at larger datasets, with $N > 1000$, \texttt{DTW-C++} is on average 65% faster. In all, apart from 2 of the 115 cases where \texttt{DTW-C++} is the fastest, we used the k-medoids algorithm for clustering. \autoref{fig:k_med} shows the increasing performance of \texttt{DTW-C++} as the number of time series increases. In this comparison, both algorithms used k-medoids, so the speed improvement is due to faster dynamic time warping method in \texttt{DTW-C++}.

With respect to clustering, \texttt{DTW-C++} with integer programming was on average 16 times slower than \texttt{DTAIDistance} over all samples, and as the number of time series increases, integer programming clustering becomes increasingly slower (\autoref{fig:speed_IP}). This is to be expected because the computational complexity of the integer programming optimisation increases significantly as the number of time series in the clustering problem increases. However, as the lengths of each time series increase, the performance of integer programming converges to the speed of \texttt{DTAIDistance}, and the former finds globally optimal results. Therefore, the integer programming approach is recommended for occasions when the individual time series to be clustered are very long, but the number of individual time series is small (e.g., fewer than 1000).

Expand Down

0 comments on commit 90c6aed

Please sign in to comment.