Spectral Envelope

The spectral envelope is a tool to study cyclic behaviors in categorical data. It is more informative than the traditional approach of attributing a different number to each category for power-spectral density estimation.

For each frequency in the spectrum, the spectral envelope finds an optimal real-numbered mapping that maximizes the normed power-spectral density at this point. Therefore, no matter what mapping is choosen for the different categories, the power-spectral density will always be bounded by the spectral envelope.

The spectral envelope was defined by David S. Stoffer in DAVID S. STOFFER, DAVID E. TYLER, ANDREW J. MCDOUGALL, Spectral analysis for categorical time series: Scaling and the spectral envelope.

Main functions


spectral_envelope — Function


spectral_envelope(ts; m = 3)

Computes the spectral envelope of an input categorical time-series.
The degree of smoothing can be chosen by the user.

Parameters:

  • ts (Array{Any,1}): 1-D Array containing input categorical time-series.
  • m (Int): Smoothing parameter. corresponds to how many neighboring points are to be involved in the smoothing (weighted average). Defaults to 3.

Returns: (freq, se, eigvecs), with freq the frequencies of the power-spectrum, se
           the values of the spectral envelope for each frequency in 'freq'. eigvecs contains
           the optimal real-valued mapping for each frequency point.


get_mappings — Function


get_mappings(data, freq; m = 3)

Computes, for a given frequency freq, the optimal mappings for the categories in data. Scans the vincinity of freq to find the maximum of the spectral envelope, prints a sum up and returns the obtained mappings.

Parameters:

  • data (Array{Any,1}): 1-D Array containing input categorical time-series.
  • freq (Float): Frequency for which the mappings are wanted. The vincinity of 'freq' will be scanned to find maximal value of the spectral envelope.
  • m (Int): Smoothing parameter. corresponds to how many neighboring points are to be involved in the smoothing (weighted average). Defaults to 3.

Returns: mappings, the optimal mappings for the found maxima around 'freq'.

Example

Applying the spectral envelope to study a segment of DNA from the Epstein-Barr virus and plotting the results:

using DelimitedFiles, Plots
using CategoricalTimeSeries

data_path = joinpath(dirname(dirname(pathof(CategoricalTimeSeries))), "test", "DNA_data.txt")
data = readdlm(data_path, ',')
f, se, eigvecs = spectral_envelope(data; m = 0)

plot(f, se, xlabel = "Frequency", ylabel = "Intensity", title = "test data: extract of Epstein virus DNA", label = "spectral envelope")

To get the associated optimal mapping for the peak at frequency 0.33:

mappings = get_mappings(data, 0.33; m = 0)
>> position of peak: 0.33 strengh of peak: 0.02
print(mappings)
>> Dict{SubString{String}, Float64} with 4 entries:
  "A" => -0.59
  "T" => 0.55
  "C" => 0.0
  "G" => 0.6