# Categorical Time-Series Analysis

## Introduction

**CategoricalTimeSeries.jl** is a Julia package regrouping methods of categorical time-series analysis. The term *categorical* commonly refers to two types of data: *nominal* and *ordinal*.

**Nominal**, or labeled values represent *discrete* units that have no intrinsic *order*, like common types of pet:

- Cat
- Dog
- Bird

**Ordinal** values on the other hand represent *discrete* and *ordered* units, like the size of a coffee cup:

- Small
- Medium
- Large

When categorical data is layed out in function of time, one speaks of *categorical time-series*.

Often, especially when dealing with ordinal time-series, it is enough to map the different values to a set of integers to carry the analysis. However, when it is not sufficient **CategoricalTimeSeries.jl** is here to help.

## Overview

The package lets you carry four main kind of analysis: **Spectral analysis**, **Data clustering**, **correlations analysis** and **motif recognition**. Other functionnalities are also avalaible (see misc.). These methods are agnostic to the type of data used (ordinal or nominal) as they do not rely on a pre-established ordering. Here is a quick overview of these methods, for more details go to the specific sections.

#### Spectral analysis

The standard approach to study spectral properties in categorical time-series is to map the different values to a set of numbers. While the overall shape of the spectrum is usually unaffected by this operation, peaks representing cyclic behaviors in the time-series can completely disappear depending on the choice of mapping. To tackle this issue, one can use the spectral envelope method. It was developed by *David S. Stoffer* in order to identify optimal mappings.

#### Data clustering

A categorical time-series might have many different possible values, but these values are rarely independant. Some categories can be loosely equivalent to one-another. For example, among the values `"male"`

, `"female"`

, `"house"`

and `"car"`

, it is evident that `"male"`

and `"female"`

refer to similar concepts and could be grouped in one single category. Data clustering is the task of identifying such relationships in the time-series in order to reduce it to a more efficient representation. Here, an implementation of the *information bottleneck* concept is used to this end.

#### Motif recognition

Time-series sometimes present repeating motifs (or patterns) that are worthwhile identifying. In categorical time-series, the lack of proper distance measurement complicates this task, nonetheless several methods have been developed to this end. An implementation using the concept of *random projection* is used here.

#### Correlation analysis

The notion of autocorrelation function is formally not defined for a categorical time-series. Yet, it might be of interest to know how inter-dependent the values of the time-series are. Efforts have been made to generalize the concept of linear correlations to categorical time-series. This package implements several of these methods.

## Installation

The source code is available on GitHub, otherwise,
**CategoricalTimeSeries.jl** can be installed with

```
using Pkg
Pkg.add("CategoricalTimeSeries")
```

To use it, you need to import it:

```
using CategoricalTimeSeries
```