Most state-of-the-art (SOTA) time series classification methods are limited by high computational complexity. This makes them slow to train on smaller datasets and effectively unusable on large datasets.
Recently, ROCKET (RandOM Convolutional KErnel Transform) has achieved SOTA of accuracy in just a fraction of the time as other SOTA time series classifiers. ROCKET transforms time series into features using random convolutional kernels and passes the features to a linear classifier.
MiniRocket is even faster!
MiniRocket (MINImally RandOm Convolutional KErnel Transform) is a (nearly) deterministic reformulation of Rocket that is 75 times faster on larger datasets and boasts roughly equivalent accuracy.
The world is inherently dynamic and nonstationary — constantly changing.
It is common for the performance of machine learning models to decline over time. This occurs as data distributions and target labels (“ground truth”) evolve. This is especially true for models related to people.
Thus, an essential component of machine learning systems is monitoring and adapting to such changes.
In this article, I will introduce this idea of concept drift or regime change and then discuss three ways to handle it and what you should consider.
New tools for model monitoring are emerging, but it is still important to understand…
Clustering is an unsupervised learning task where an algorithm groups similar data points without any “ground truth” labels. Clustering different time series into similar groups is a challenging because each data point is an ordered sequence.
In a previous article, I explained how the k-means clustering algorithm can be adapted to time series by using Dynamic Time Warping, which measures the similarity between two sequences, in place of standard measures like Euclidean distance.
Unfortunately, the k-means clustering algorithm for time series can be very slow!
Hierarchical clustering is faster than k-means because it operates on a matrix of pairwise distances…
“The task of time series classification can be thought of as involving learning or detecting signals or patterns within time series associated with relevant classes.” — Dempster, et al 2020, authors of ROCKET paper
Most time series classification methods with state-of-the-art (SOTA) accuracy have high computational complexity and scale poorly. This means they are slow to train on smaller datasets and effectively unusable on large datasets.
A common task for time series machine learning is classification. Given a set of time series with class labels, can we train a model to accurately predict the class of new time series?
In machine learning with time series, using features extracted from series is more powerful than simply treating a time series in a tabular form, with each date/timestamp in a separate column. Such features can capture the characteristics of series, such as trend and autocorrelations.
But… what sorts of features can you extract and how do you select among them?
In this article, I discuss the findings of two papers that analyze feature-based representations of time series. …
Clustering is an unsupervised learning task where an algorithm groups similar data points without any “ground truth” labels. Similarity between data points is measured with a distance metric, commonly Euclidean distance.
Clustering different time series into similar groups is a challenging clustering task because each data point is an ordered sequence.
The most common approach to time series clustering is to flatten the time series into a table, with a column for each time index (or aggregation of the series) and directly apply standard clustering algorithms like k-means. …
Why? Existing tools are not well-suited to time series tasks and do not easily integrate together. Methods in the scikit-learn package assume that data is structured in a tabular format and each column is i.i.d. — assumptions that do not hold for time series data. Packages containing time series learning modules, such as statsmodels, do not integrate well together. Further, many essential time series operations, such as splitting data into train and test sets across time, are not available in existing python packages.
To address these challenges, sktime was created.
A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.
The scope of a full stack data scientist covers every component of a data science business initiative, from identifying to training to deploying machine learning models that provide benefit to stakeholders.
The year is 2019 and you have deployed a machine learning model that forecasts demand for toilet paper (or anything else, really). In 2020, COVID-19 emerges, sending consumers to stores to snatch up unprecedented quantities of toilet paper. The actual sales numbers are not outliers because consumer behavior has changed due to people staying home more often. As a result, your toilet paper demand model no longer matches the new, COVID-era consumer demand.
How can machine learning systems account for such changes that hurt model performance?
The world is inherently dynamic and nonstationary — that is, constantly changing.
Data scientist working in the financial services industry