# Reorganizing data-matrix

Hi all,

Background: I’m planning to analyze daily data covering all months of the year. The total time horizon is six years. The data-matrix has the following structure (please, see attached pdf-file observation for 1980): 12 columns – one for each month and 31 rows for daily observations – but note that February has 29 observations and April, June, Sept and November, respectively have 30). Problems: (1) Is it possible to reorganize the data in Shazam so the daily data/observations for each of the months are linked together in a single column, i.e. month by month etc.? (2) I’m also interesting in generating monthly average data by using this matrix (monthly scale over six years) and the first best is to store the data in one column.

testdata.csv

edit retag close merge delete

Sort by » oldest newest most voted

To generate monthly means from the data (see other answer about replacing the semicolons with commas and adjusting for missing values) simply load the file, create and save means using:

read(testdata1.csv) / names
set skipmiss
stat / mean=means


The result is:

NAME        N    MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
YR1980       31   16.000      9.0921      82.667       1.0000       31.000
JAN          31   21.997      3.7186      13.828       14.900       31.700
FEBR         31   19.219      4.9212      24.218       10.600       28.600
MARCH        31   26.394      4.3367      18.807       20.500       37.900
APRIL        31   29.348      5.0025      25.025       20.400       39.400
MAY          31   31.787      3.0804      9.4892       25.500       37.200
JUNE         31   31.781      2.0271      4.1089       27.400       35.800
JULY         31   34.055      3.8160      14.562       19.200       37.700
AUG          30   34.970      2.4419      5.9629       28.200       38.300
SEPT         30   28.493      2.2111      4.8889       21.000       31.400
OCTOB        30   27.907      2.6245      6.8882       19.400       30.900
NOV          30   25.973      2.0758      4.3089       20.000       28.800
DEC          29   21.741      3.7812      14.298       15.400       28.800

more

SHAZAM reads text data easily if the delimiter(separator) is a space, comma or tab.

However, the file above uses a semicolon separator (applied automatically by MS Excel in some locales) which is not supported. To work with this data, replace the semicolons with either commas, tabs or spaces using a text editor.

Alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data before then saving it as a CSV format file explicitly selecting the Comma Delimited version, if it is available.

The file also has some missing values at the bottom and these should be replaced with a missing value code. The default in SHAZAM is -99999. After replacing with commas and adding the missing value code, the file then becomes: testdata1.csv

Here is a simple script to then produce the vector from the comma delimited (separated) file:

* Read the data into a matrix skipping the names on the first row.
* Note: It is required to specify the number of columns.
read(testdata1.csv) mydata / skiplines=1 cols=13
matrix mymat = mydata

* Calculate the total number of rows to be in the new vector
gen1 length = $rows *$cols

* Set the sample size for the new vector using this length
sample 1 length

* Create the vector from the matrix columns with the matrix command
matrix myvec = vec(mymat)


It is also possible to read the data as variables and use the matrix command to concatenate them. Doing it this way means there is no need to specify the number of columns so it can sometimes be preferable. Here is how it would be done with this dataset.

* Read the data using the first row as variable names and list the data
read (testdata.shd) / names list

* Create a matrix by concatenating the variables ('|' does matrix concatenation)
matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC


When doing it this way for many similar files a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. This string can be substituted into a subsequent statement using the format [alias]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed:

* Using a character string for months
months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC
matrix mymat = [months]


Data in SHAZAM may be represented as either variables or matrices (or vectors). It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes it is preferable to use matrix data as a variable, particularly when you want to pull out a column of a matrix to a variable and don't wish to keep specifying the column index. In the above, to convert the vector to a data variable simply do:

* Convert a vector to a variable
genr myvar = myvec


To convert it back again:

* Convert a variable to a vector ...
more

Shazam help,

Thank a lot for looking into the problem and for making a program to reorganize the data matrix. I look forward to testing the Shazam script :)

Regards

( 2014-01-16 15:46:55 +0000 )edit

## Stats

Asked: 2014-01-15 13:08:58 +0000

Seen: 794 times

Last updated: Jan 21 '14