SublimeText – Flatland

Bored of your current text editor? Try the blazing fast SublimeText: awesomeness included!

Top Five Features:

  • Split Editing
  • Command Palette
  • Package Control
  • Pretty JSON Configuration
  • Fast & Customizable

SublimeText2 (ST2) offers a different approach from other text editors: user first, customization is the key. You can change almost anything (color scheme, theme, keybindings) just writing a couple of JSON lines in the user config file or running commands from the Command Palette. For instance you can create your own theme and color scheme pair completely changing the look&feel of the app. Interested? Take a look of these screenshoots showing the optional Flatland theme and Monokai color scheme:

Flatland Theme

Wait! There’s more. Few keystrokes with Package Control plugin and you can install optional features in seconds.

References:

ARFF reader/writer for MATLAB

After the early posts (1,2) on ARFF tools for MATLAB I would show usage examples of them but first I’m going to give some insights on their design which could be helpful to understand the approach used by the ARFF library.

While writing the code I started from the ARFF’s Weka Documentation, in particular looking at the stable version of ARFF specification, then I chose to leave out (at least for the first implementation) the Sparse ARFF support.

ARFF MATLAB

Being the MATLAB code a different world from the standard Java API used by Weka, I chose to implement the ARFF’s payload, the instances, as a single struct array representing the ARFF’s dataset. As you may know the dataset comes with a brief description of each instance’s attribute (aka the real data). This extra piece of information, which is common to all instances, is located in the header section of the ARFF file. It brings a description of each attribute’s name and type and gives Weka hints on how to read/update/write the entire file content.

For instance, a common header could include several attributes and just one nominal specification:

1
2
3
4
5
6
7
@RELATION example_dataset

@ATTRIBUTE idx NUMERIC
@ATTRIBUTE low NUMERIC
@ATTRIBUTE med NUMERIC
@ATTRIBUTE high NUMERIC
@ATTRIBUTE type_class { front, middle, rear }

and after that comes the payload (aka the instances):

1
2
3
4
5
6
7
8
@DATA
1,6,53,95,rear
2,27,57,96,rear
3,6,66,70,middle
4,7,42,78,front
5,17,65,80,middle
6,20,57,80,rear
...

Being a simple text format is an advantage for in-place editing and implementing the parsing directly using MATLAB code is not tough. However there is a small problem: dealing with nominal specification attributes (i.e. attributes which allow a limited set of (string) values) isn’t so straightforward inside the MATLAB enviroment. For keeping things simple I used a small work-around: using an extra argument (nomspec) for describing the nominal attributes value mappings while doing the ARFF parsing. Probably isn’t the cleanest solution but it does its jobs.

Using a simple struct array to hold all the dataset payload doesn’t help when come to nominal spec. attributes because one can assign any sort of datatype to a struct’s field. However from a parser point of view this aspect can be overcome by introducing a simple convention: just add “_class” string to each struct’s field name which needs to be mapped to a nominal spec. attribute.

In terms of code this approach needs just few lines when saving an ARFF dataset:

1
2
3
4
5
6
7
8
9
10
% add attributes to an instance
data(1).amplitude = 123;
data(1).type_class = 'a';
% ...

% define nominal specification
nomspec.type_class = { 'a', 'b', 'c' };

% save the dataset
arff_write(arff_file, data, relname, nomspec);

Or when loading it:

1
2
3
4
5
6
7
% import data
[data, relname, nomspec] = arff_read(arff_file);

% check nominal spec attribute
nomspec.type_class

>> { 'a' 'b' 'c' }

Using this hacky solution you can unleash the power of the ARFF file format while doing you MATLAB/Weka simulations without needing any dataset conversion. Handy, isn’t?

For more extensive usage examples or for more info about these tools please look at the ARFF reader/writer page.

Latex – Highlight & Todo

A common problem while writing long LaTeX documents is highlighting portions of text and dealing with revisions. You can ask yourself: why would I use highlighters within a latex doc? Then you are probably ignoring thousands of Microsoft Word lovers, didn’t you?! If you ever tried to change just the color of a few lines of text you may already noticed the standard behavior of LaTeX: outputting long lines of error with weird messages like “dear user, don’t be silly!”.

If you want to introduce highlighting in your workflow then you have to rely on external packages like soul or soulutf8 paired with color package for advanced customization. I’ll go directly with soulutf8 if you deal with non-english texts.

Here is a MWE (Minimum Working Example) of its use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
\documentclass[a4paper,11pt]{article}

% soul package will provide hl environment
\usepackage{soul}
\usepackage{color
}

% soul highlighting config
\setul{1ex}{0.8ex}
\definecolor{orange}{rgb}{1,0.5,0}
\setulcolor{orange}

\usepackage{lipsum}

\begin{document}

\lipsum[1]
\hl{Here is your highlighted text! Please stop abusing of lipsum.}
\lipsum[2]

\end{document
}

If you need more features and you want a fancy solution for dealing with revisions it is a good option to jump directly to the more powerful todonotes package. Take a look at todonotes’ documentation for the complete list of features and customizations. Here a picture of what you can achive with todonotes.

Latex: todonotes package example!

Continue reading “Latex – Highlight & Todo” »

Wordle: Word Clouds

Have you heard about word clouds? It is a well-known feature of WordPress when talking about “tag clouds” but what about a word cloud with the content of your posts? Well, for that kind of clouds there is Wordle!

Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like. You can print them out, or save them to the Wordle gallery to share with your friends.

It is fun, quick and powerful! A good example of a well done Java Applet. Here is my blog’s word cloud (extracted from the content visibile on the home page).

Word Cloud 2012-11

Latex Footnotes – Tables & Bottom of Page

A common problem while using LaTeX is nesting footnotes inside tables. If you tried to insert a footnote into a cell you already noticed the standard behavior of LaTeX: footnote get lost and doesn’t appear in your page.

If you want to change the standard behavior you have to rely on external packages like minipage or footnote. Personally I don’t like to much the minipage approach but it could be useful if you want to display immediately the notes after the table.

Generally I choose the footnote package which provides the savenotes environment, a simple wrapper of your whole table (not only the tabular environment), that catches all your table’s footnotes and display them according to the standard LaTeX behavior (ie. after the last paragraph of the page).

Here is a MWE (Minimum Working Example) of its use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
% table's footnotes - minimum working example
\documentclass[12pt]{article}

% footnote package will provide savenotes environment
\usepackage{footnote}

\begin{document}

\begin{savenotes}
  \begin{table}[ht]
  \centering
  \begin{tabular}{|l|c|c|}
    \hline
    A & 1 & 2 \footnote{This is the first footnote.} \\
    \hline
    B & 2 & 1 \\
    \hline
    C & 3 \footnote{This is the second footnote.} & 3 \\
    \hline
  \end{tabular}
  \caption{A table caption.}
  \end{table}
\end{savenotes}

\end{document
}

Continue reading “Latex Footnotes – Tables & Bottom of Page” »

ARFF writer for MATLAB

Following the other post about the ARFF reader, I wrote a simple MATLAB’s utility function to export data in Weka’s friendly format ARFF.

EDIT: for more info about the ARFF writer look at the ARFF reader/writer page.

The ARFF writer avoids the need of passing through the CSV format for data export, giving you more control on attributes type and, last but not least, on nominal-specification attributes, which could be fundamental if you have a large dataset to classify.

Data exported in ARFF format (*.arff or *.arff.gz) by writer could be easily imported back in your MATLAB’s environment using ARFF reader without any need of data conversion.

Here is the writer source code, in a next post we’re going to see how to use ARFF writer and reader in your MATLAB workflow with some examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
% ARFF_WRITE - Saves a MATLAB's struct array to file using ARFF file format.
%
%   ARFF_WRITE(arff_file, DATA, relname, nomspec)
%       arff_file => output file (.arff / .arff.gz extension)
%       DATA => struct array representing data and attributes (n x attrs)
%       relname => relation name (string)
%       nomspec => struct array defining nominal-specification attributes
%
%   NOTES:
%       Attribute name is taken from DATA struct fieldname and attribute
%       type is taken from field data-type.
%
%       Append "_class" to a DATA struct fieldname to save an attribute as
%       nominal-specification attribute and specify the nominal-names
%       inside NOMSPEC struct array using as fieldname the DATA struct's
%       fieldname and as content a cell array of names (string).
%
%       Append "_date" to a DATA struct fieldname and use numerical date
%       representation (using datenum) to save an attribute as date type
%       (using 'yyyy-mm-dd HH:MM:SS' format in ARFF file).
%
%       TODO -- According to SPEC any attribute that contain space must be
%       quoted using single quote char.
%
%       See ARFF format specification on WEKA site.

% Authors:
%   Valerio De Carolis          <valerio.decarolis@gmail.com>
%
%  28 September 2012 - University of Rome "La Sapienza"

function [] = arff_write(arff_file, data, relname, nomspec)

    if nargin < 3
        error('MATLAB:input','Not enough inputs!');
    end
   
    if isempty(data) || ~isstruct(data)
        error('MATLAB:input','Please use struct data input!');
    end
   
    if isempty(arff_file)
        arff_file = sprintf('output-%d.arff', randi(1000,1));
    end
   
    if isempty(relname)
        relname = sprintf('relname-%d', randi(1000,1));
    end
   
    % check file extention
    [arff_path, arff_name, ext] = fileparts(arff_file);
   
    if strcmpi(ext,'.arff')
       
        % open file
        fid = fopen(arff_file, 'w+t');
       
    elseif strcmpi(ext,'.gz')
       
        % temp file
        outfile = fullfile(tempdir, arff_name);
       
        % open file
        fid = fopen(outfile, 'w+t');
   
    else
        error('%s is not a valid arff_file', arff_file);
    end
   
    % write relname
    fprintf(fid, '@RELATION %s\n\n', relname);
   
    % write attributes
    fields = fieldnames(data);
    ftypes = zeros(size(fields));
   
    for fn = 1 : length(fields)
       
        if isnumeric( data(1).(fields{fn}) )
           
            dt = strfind(fields{fn}, '_date');
           
            if isempty(dt)
                type = 'NUMERIC';
                ftypes(fn) = 0;
            else
                % check SimpleDateFormat (java.doc) to accept this instead of ISO-8601
                type = 'DATE "yyyy-mm-dd HH:MM:SS"';
                ftypes(fn) = 3;
                %name = fields{fn}(1:max(dt)-1);
            end
           
        elseif ischar( data(1).(fields{fn}) )
           
            ct = strfind(fields{fn}, '_class');
           
            if isempty(ct)
                type = 'STRING';
                ftypes(fn) = 2;
            else          
                if isstruct(nomspec) && isfield(nomspec, fields{fn}) && ...
                        iscell(nomspec.(fields{fn}))
                   
                    type = '{';

                    for k = 1 : length( nomspec.(fields{fn}) ) - 1
                        type = sprintf( '%s %s,', type, nomspec.(fields{fn}){k} );
                    end

                    type = sprintf('%s %s }', type, nomspec.(fields{fn}){k+1});

                else
                    fclose(fid);
                    error('MATLAB:input','Inferring class specification from data!');
                    % TODO inference
                end
               
                ftypes(fn) = 1;
                %name = fields{fn}(1:max(ct)-1);
            end
           
        else
            fclose(fid);
            error('MATLAB:input','Cannot convert %s field to ARFF format!', fields{fn});
        end
       
        fprintf(fid, '@ATTRIBUTE %s %s\n', fields{fn}, type);
        %fprintf(fid, '@ATTRIBUTE %s %s\n', name, type);
       
    end
   
    % write data
    fprintf(fid, '\n@DATA\n');
    content = '';
   
    for n = 1 : length(data)
       
        for fn = 1 : length(fields)
           
            if isempty(data(n).(fields{fn}))
                content = '?';
            else
                switch ftypes(fn)
                    case 0
                        content = num2str( data(n).(fields{fn}) );
                    case 1
                        content = data(n).(fields{fn});
                    case 2
                        content = data(n).(fields{fn});
                    case 3
                        content = ['"' datestr(data(n).(fields{fn}), 'yyyy-mm-dd HH:MM:SS') '"'];
                end
            end
           
            if fn < length(fields)
                fprintf(fid,'%s,', content);
            else
                fprintf(fid,'%s', content);
            end
           
        end
       
        fprintf(fid,'\n');
       
    end
   
    % close file
    fclose(fid);
   
    % remove temporary file & compress .arff
    if exist('outfile','var') && ~isempty(outfile)
        gzip(outfile, arff_path);
        delete(outfile);
    end

end

% References:
%   [1]: http://www.cs.waikato.ac.nz/ml/weka/arff.html

ARFF reader for MATLAB

Recently I needed to export some simulation data, generated with MATLAB, to Weka, a very interesting Machine Learning tool written in Java.

EDIT: for more info about the ARFF reader look at the ARFF reader/writer page.

So I started to read some documentation on Weka website because I wanted to learn more and use all the possibilities offered by Weka preferred file format: ARFF.

One can ask why I’m not using the really simple CSV file format. The reason is as simple as CSV, it can’t represent nominal-specification attributes (like class style one’s) in a straightforward way.

For these reasons I wrote a couple of MATLAB’s utility functions to read and write data from ARFF files. Currently supported extensions are .arff and .arff.gz (using MATLAB’s embedded gzip/gunzip functions with temp files).

Here we see the ARFF reader’s source code, in a next post we’re going to see ARFF writer’s one together a couple of usage examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
% ARFF_READ - Read content of an ARFF file to a MATLAB's struct array.
%
%   [DATA, relname, nomspec] = ARFF_READ(arff_file)
%       arff_file => input file (.arff / .arff.gz extension)
%       relname => relation name (string)
%       DATA => struct array representing data and attributes (n x attrs)
%       nomspec => struct array defining nominal-specification attributes
%
%   NOTES:
%       See ARFF_WRITE to read notes about relname and nomspec.
%       See ARFF format specification on WEKA site.

% Authors:
%   Valerio De Carolis          <valerio.decarolis@gmail.com>
%
%  28 September 2012 - University of Rome "La Sapienza"

function [data, relname, nomspec] = arff_read(arff_file)

    if nargin < 1
        error('MATLAB:input','Not enough inputs!');
    end
   
    if isempty(arff_file)
        error('MATLAB:input','Bad file name!');
    end
   
    % check file extention
    [~, ~, ext] = fileparts(arff_file);
   
    if strcmpi(ext,'.arff')
       
        % open file
        fid = fopen(arff_file, 'r+t');
       
    elseif strcmpi(ext,'.gz')
       
        % temporary working dir
        outdir = tempdir;
       
        % decompress
        dec_files = gunzip(arff_file, outdir);
       
        if ~isempty(dec_files)
            fid = fopen(dec_files{1}, 'r+t');
        else
            error('%s is not a valid arff_file', arff_file);
        end            
   
    else
        error('%s is not a valid arff_file', arff_file);
    end    
   
    if fid == -1
        error('MATLAB:file','File not found!');
    end  
   
    % read relname
    relname = [];
   
    while isempty(relname)
        tline = fgetl(fid);
       
        if ~ischar(tline)
            error('MATLAB:file','ARFF file not recognized!');
            fclose(fid);
        end
       
        % avoid parsing @DATA and skip blank lines
        if length(tline) > 9 && tline(1) == '@' && strcmpi(tline(2:9),'RELATION')
            relname = tline(11:end);
            break;
        end            
    end
   
    % read attributes
    fields = {};
    ftypes = [];
   
    floop = 1;
    fn = 1;
   
    while floop
        tline = fgetl(fid);
       
        if ~ischar(tline)
            break;
        end
       
        % avoid parsing @DATA and skip blank lines
        if length(tline) > 5 && tline(1) == '@' && strcmpi(tline(2:10),'ATTRIBUTE')
           
            %at = strfind(tline, ' ');
            %
            %if length(at) < 2
            %    error('MATLAB:file','ARFF file not recognized!');
            %end
            %
            %fields{fn} = tline(at(1)+1:at(2)-1);
            %typedef = tline(at(2)+1:end);
           
            % parsing using textscan? (good for data, less for attributes)
            A = textscan(tline,'%s %s %s','Whitespace',' \t\b{},');
           
            if isempty(A{1}) || isempty(A{2}) || isempty(A{3})
               error('MATLAB:file','ARFF file not recognized!');
               fclose(fid);
            end
           
            if size(A{1},1) == 1
                fields{fn} = char(A{2});
                typedef = char(A{3});
            else
                fields{fn} = char(A{2}(1));
                bt = strfind(tline,'{');
                typedef = tline(bt(1):end);
            end
           
            if typedef(1) == '{' && typedef(end) == '}'
                ftypes(fn) = 1;
                %nomspec.(fields{fn}) = typedef;
               
                % out is a cell with parsed classes assuming { x, x, x } format  
                out = textscan(typedef, '%s', 'Delimiter', ' ,{}', 'MultipleDelimsAsOne', 1);
               
                % expand cell (avoid cell of cell)
                nomspec.(fields{fn}) = out{:};
            else
               if strcmpi(typedef,'NUMERIC')
                   ftypes(fn) = 0;
               elseif strcmpi(typedef,'STRING')
                   ftypes(fn) = 2;
               else
                   dt = strfind(typedef, ' ');
                   
                   if ~isempty(dt) && strcmpi(typedef(1:dt(1)-1), 'DATE')
                       ftypes(fn) = 3;
                       % implement date-format parsing
                   else
                       error('MATLAB:file','ARFF file not recognized!');
                       fclose(fid);
                   end
               end
            end
           
            fn = fn + 1;
        end
       
    end
   
    % create data struct
    data = struct();
   
    for fn = 1 : length(fields)
        data.(fields{fn}) = [];
    end
   
    % store empty struct
    data_tmpl = data;
       
    % rewind file
    fseek(fid,0,-1);
       
    % seek data
    has_data = 0;
   
    while floop
        tline = fgetl(fid);
       
        if length(tline) == 5 && strcmpi(tline(1:5),'@DATA')
            has_data = 1;
            break;
        end
       
        if ~ischar(tline)
            break;
        end
    end
   
    if has_data == 1
       
        dcnt = 1;
       
        while floop
            tline = fgetl(fid);

            if length(tline) > 1
               
                % find values
                vt = strfind(tline,',');
               
                % init with empty struct
                data(dcnt) = data_tmpl;
               
                for k = 1 : length(vt) + 1
               
                    if k == 1
                        if isempty(vt)
                            content = tline(1:end);
                        else
                            content = tline(1:vt(k)-1);
                        end
                    elseif k <= length(vt)
                        content = tline(vt(k-1)+1:vt(k)-1);
                    else
                        content = tline(vt(k-1)+1:end);
                    end

                    switch ftypes(k)
                        case 0
                            data(dcnt).(fields{k}) = str2double( content ); %str2num( content );
                        case 3
                            data(dcnt).(fields{k}) = datenum( content(2:end-1), 'yyyy-mm-dd HH:MM:SS' );
                        otherwise
                            data(dcnt).(fields{k}) = content;
                    end
               
                end
               
                dcnt = dcnt + 1;
               
            end

            if ~ischar(tline)
                break;
            end
        end
       
    end
   
    % close file
    fclose(fid);
   
    % remove temporary decompressed file
    if exist('dec_files','var') && ~isempty(dec_files)
        dec_files{1}
        delete(dec_files{1});
    end

end

% References:
%   [1]: http://www.cs.waikato.ac.nz/ml/weka/arff.html