Thoughts and Theory

The FASTEST state-of-the-art algorithm for series classification with Python

By at

Most state-of-the-art (SOTA) time series classification methods are limited by high computational complexity. This makes them slow to train on smaller datasets and effectively unusable on large datasets.

Recently, ROCKET (RandOM Convolutional KErnel Transform) has achieved SOTA of accuracy in just a fraction of the time as other SOTA time series classifiers. ROCKET transforms time series into features using random convolutional kernels and passes the features to a linear classifier.

MiniRocket is even faster!

MiniRocket (MINImally RandOm Convolutional KErnel Transform) is a (nearly) deterministic reformulation of Rocket that is 75 times faster on larger datasets and boasts roughly equivalent accuracy.


Opinion

The “sklearn” for machine learning on streaming data

Image by at

Conventional machine learning algorithms, such as linear regression and xgboost, operate in “batch” mode. That is, they fit a model using a full dataset in one go. Updating that model with new data requires fitting a brand new model from scratch using both the new data and the old data.

In many applications, this can be difficult or impossible! It requires all data to fit into memory, which isn’t always possible. The model itself can be slow to re-train. Retrieving older data for the model can be a big challenge, particularly in applications where data is continuously generated. …


How to perform temporal cross validation with sktime in python

Photo by on

Cross validation is a useful procedure to help select optimal hyperparameters for a machine learning model. It is especially useful for smaller datasets, where there is not enough data to create representative train, validation, and test sets. Simply stated, cross validation splits a single training dataset into multiple subsets of train and test datasets.

The simplest form is k-fold cross validation, which splits the training set into k smaller sets, or folds. For each split, a model is trained using k-1 folds of the training data. The model is then validated against the remaining fold. Then for each split, the…


Anyone can do it, in theory

Photo by on

There are many paths that lead to Medium success and I believe I have found one of them. I consistently average $150-$250 per month, excluding the $500 Medium bonuses. Once I started tagging my articles as #artificialintelligence, I earned the Top Writer in Artificial Intelligence tag. I write 1 or 2 articles per month, sometimes less frequently, depending on how much time I have.


Contribute to a growing python package for time series machine learning

Image by at

is a popular new python package for time series machine learning. The contributors continue to fix bugs and add new features — and invite you to contribute too!

Why contribute to sktime?

  1. Improve your skills in machine learning and coding.
  2. Learn the nuts-and-bolts of machine learning algorithms.
  3. Build your resume.
  4. Give back to the open-source community. Many common machine learning tools are also open-source.

The community is particularly motivated to support new and/or anxious contributors. People who are looking to learn and develop their skills are welcomed and supported.

sktime Dev Days

Community members of all experience levels are invited to the…


Advice from a Top Writer in Artificial Intelligence

So you want to write a widely read article about Data Science / Machine Learning / Artificial Intelligence?

In May 2021, I was recognized as a top writer in AI and was among the top 1000 writers in the Medium Partner Program. My older articles still continue to receive views and often appear in Google searches. (Scroll to the bottom for a screenshot of my stats).

Read along to learn some of the keys to my success.

I earned this badge in May 2021 once I started tagging my articles with “Artificial Intelligence”.

Selecting Topics

I first learned about Medium as a data scientist searching for specific topics in Data Science. …


Machine Learning

Considerations for Anomaly Detection Machine Learning Tasks

Image by at

Outlier detection is a machine learning task that aims to identify rare items, events, or observations that deviate from the “norm” or general distribution of the given data.

An anomaly is something that arouses suspicion that it was generated by different data generating mechanism

The Outlier Detection Machine Learning Task

In the outlier detection task, the goal is train an unsupervised model to find anomalies subject to two constraints:

  1. Minimize false negatives (aka catch as many anomalies as possible).
  2. Minimize false positives (aka when an anomaly is flagged, don’t be wrong).

In many applications, there is a third constraint: the “ground truth” of what are true…


This is an exciting development! This will certainly be useful for many practitioners.


The scikit-learn for outlier detection machine learning tasks

Photo by at

is a Python library with a comprehensive set of scalable, state-of-the-art (SOTA) algorithms for detecting outlying data points in multivariate data. This task is commonly referred to as or .

The outlier detection task aims to identify rare items, events, or observations that deviate from the “norm” or general distribution of the given data.

My favorite definition: An anomaly is something that arouses suspicion that it was generated by different data generating mechanism

Common applications of outlier detection include fraud detection, data error detection, intrusion detection in network security, and fault detection in mechanics.

Why Specific Algorithms for Anomaly Detection?

Practically speaking…


A python-based, fast, parameter-free, and highly interpretable unsupervised anomaly detection method

by

Outliers, or anomalies are data points that deviate from the norm of a dataset. They arouse suspicion that they were generated by a different mechanism.

Anomaly detection is (usually) an unsupervised learning task where the objective is to identify suspicious observations in data. The task is constrained by the cost of incorrectly flagging normal points as anomalous and failing to flag actual anomalous points.

Applications of anomaly detection include network intrusion detection, data quality monitoring, and price arbitrage in financial markets.

Copula-Based Outlier Detection — COPOD — is a new algorithm for anomaly detection. …

Alexandra Amidon

Data scientist working in the financial services industry

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store