# Revision history [back]

Note that the data above is not a csv file, instead the delimiter used is a semicolon. To work with it, replace the semicolons with spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. The file then becomes: testdata.shd

After resolving the delimiter problem, here is a script (with comments) to produce the vector from the above file:

* Read the data into a matrix skipping the names on the first row. You need to specify the number of columns.

* Calculate the total number of rows to be in the new vector
gen1 length=$rows *$cols

* Set the sample size for the new vector using this length
sample 1 length

* Create the vector from the matrix columns with the matrix command
matrix myvec = vec(mymat)


You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of variables or number of observations (fields or rows) so it can sometimes be easier, SHAZAM will instead figure this out for you. Here is how it would be done with this dataset.

* Read the data using the first row as variable names and list the data

*Create a matrix by concatenating the variables ('|' does matrix concatenation)
matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC


If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for part of a statement you need to use often. You can then substitute it in to a subsequent statement using [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later:

* Using a character string for months
months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC
matrix mymat = [months]


Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept either with. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do:

* Convert a vector to a variable
genr myvar = myvec


To convert it back again:

* Convert a variable to a vector
matrix myvec = myvar


Note that the data above is not a csv file, instead the delimiter used is a semicolon. To work with it, replace the semicolons with spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. The file then becomes: testdata.shd

After resolving the delimiter problem, here is a script (with comments) to produce the vector from the above file:

* Read the data into a matrix skipping the names on the first row. You row.
* Note that you need to specify the number of columns.
read(testdata.shd) mymat data / skiplines=1 cols=13
matrix mymat = data

* Calculate the total number of rows to be in the new vector
gen1 length=$rows *$cols

* Set the sample size for the new vector using this length
sample 1 length

* Create the vector from the matrix columns with the matrix command
matrix myvec = vec(mymat)


You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of variables or number of observations (fields or rows) so it can sometimes be easier, SHAZAM will instead figure this out for you. Here is how it would be done with this dataset.

* Read the data using the first row as variable names and list the data

*Create a matrix by concatenating the variables ('|' does matrix concatenation)
matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC


If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for part of a statement you need to use often. You can then substitute it in to a subsequent statement using [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later:

* Using a character string for months
months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC
matrix mymat = [months]


Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept either with. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do:

* Convert a vector to a variable
genr myvar = myvec


To convert it back again:

* Convert a variable to a vector
matrix myvec = myvar


Finally vectors can be easily viewed using the print command

print myvec


and to print in a single column use

print myvec / byvar


Note that the data above is not a csv file, instead the delimiter used is a semicolon. To work with it, replace the semicolons with spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. The file then becomes: testdata.shd

After resolving the delimiter problem, here is a script (with comments) to produce the vector from the above file:

* Read the data into a matrix skipping the names on the first row.
* Note that you need to specify the number of columns.
matrix mymat = data

* Calculate the total number of rows to be in the new vector
gen1 length=$rows length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of variables or number of observations (fields or rows) (fields) so it can sometimes be easier, preferable; SHAZAM will instead figure this the dimensions out for you. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for part of a statement you need to use often. the data variable concatenation. You can then substitute it in to this string into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept either with. both. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  and or to print in a single column use print myvec / byvar  Note that the data above is not a csv file, instead the delimiter used is a semicolon. To work with it, replace the semicolons with spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. The file then becomes: testdata.shd After resolving the delimiter problem, here is a script (with comments) to produce the vector from the above file: * Read the data into a matrix skipping the names on the first row. * Note that you need to specify the number of columns. read(testdata.shd) data / skiplines=1 cols=13 matrix mymat = data * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of variables (fields) columns so it can sometimes preferable; be preferable - SHAZAM will instead figure the dimensions out for you. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. You can then substitute this string into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print in a single column use print myvec / byvar  Note that the data above is not a csv file, instead the delimiter used is a semicolon. To work with it, replace the semicolons with either commas or spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. The file then becomes: testdata.shdtestdata1.csv After resolving the delimiter problem, here is a script (with comments) to produce the vector from the above file: * Read the data into a matrix skipping the names on the first row. * Note that you need to specify the number of columns. read(testdata.shd) read(testdata1.csv) data / skiplines=1 cols=13 matrix mymat = data * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of columns so it can sometimes be preferable - SHAZAM will instead figure the dimensions out for you. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. You can then substitute this string into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print in a single column use print myvec / byvar  Note that the The data above is not a comma separated csv file, instead the delimiter used is a semicolon. semicolon which is not supported. To work with it, replace the semicolons with either commas or spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. The data then saved as comma separated. Replacing with commas, the file then becomes: testdata1.csv After resolving the delimiter problem, here is a simple script (with comments) to produce the vector from the above file: * Read the data into a matrix skipping the names on the first row. * Note that you need to specify the number of columns. read(testdata1.csv) data / skiplines=1 cols=13 matrix mymat = data * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of columns so it can sometimes be preferable - SHAZAM will instead figure the dimensions out for you. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. You can then substitute this string into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print in a single column use print myvec / byvar  The data above is not a comma separated comma, tab or space delimited csv file, instead the delimiter used is a semicolon which is not supported. To work with it, replace the semicolons with either commas commas, tabs or spaces using a text editor or alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data then saved as comma separated. data. Replacing with commas, the file then becomes: testdata1.csv After resolving the delimiter problem, here is a simple script to produce the vector from the above file: * Read the data into a matrix skipping the names on the first row. * Note that you need to specify the number of columns. read(testdata1.csv) data / skiplines=1 cols=13 matrix mymat = data * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  You may also read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't need to specify the number of columns so it can sometimes be preferable - SHAZAM will instead figure the dimensions out for you. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  If you are doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. You can then substitute this string into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed later: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM can be represented as either variables or matrices. It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes you may prefer to use the data as a variable and not as a matrix, particularly when you want to pull out a column of a matrix to a variable. To convert the vector in the last line to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print in a single column use print myvec / byvar  The data SHAZAM reads text data easily and it is not necessary to explicitly specify the separator (delimiter) if it is a space, comma or tab. However, the file above is not a comma, tab or space delimited csv file, instead the delimiter used is uses a semicolon separator (applied automatically by MS Excel in some locales) which is not supported. To work with it, this data, replace the semicolons with either commas, tabs or spaces using a text editor or alternatively editor. Alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data. Replacing data before then saving it as a CSV format file explicitly selecting the Comma Delimited version, if it is available. After replacing with commas, the file then becomes: testdata1.csv After resolving the delimiter problem, here Here is a simple script to produce the vector from the above comma delimited (separated) file: * Read the data into a matrix skipping the names on the first row. * Note that you need Note: It is required to specify the number of columns. read(testdata1.csv) data / skiplines=1 cols=13 matrix mymat = data * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  You may It is also possible to read the data as variables and use the matrix command to concatenate them. Doing it this way means you don't there is no need to specify the number of columns so it can sometimes be preferable - SHAZAM will instead figure the dimensions out for you. preferable. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  If you are When doing it this way for many similar files then a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. You can then substitute this This string can be substituted into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months months can be inserted wherever it is needed later:needed: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM can may be represented as either variables or matrices. matrices (or vectors). It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes you may prefer it is preferable to use the matrix data as a variable and not as a matrix, variable, particularly when you want to pull out a column of a matrix to a variable. To variable and don't wish to keep specifying the column index. In the above, to convert the vector in the last line to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print it in a single column use print myvec / byvar  The file provided is known as a delimited (separated) Free Format file and comma separation is the most common delimiter used these days. Data can also be read as Fixed Format (the data layout is fixed) where variables are located in precise positions (columns) within a file. To read these in SHAZAM insert a FORMAT statement (see the chapter Data Input and Output) before the READ statement and add the FORMAT option to the READ statement. Software like SHAZAM can read Fixed and Free format data faster than any other structure because there is very little overhead in extracting the data compared to say reading a spreadsheet, XML or Recordset from a DBMS - where either a software layer like a driver or a complex parsing code is required to interpret the file. SHAZAM can also read those using the Data Connector and query those using SQL. SHAZAM reads text data easily and it is not necessary to explicitly specify the separator (delimiter) if it is a space, comma or tab. However, the file above uses a semicolon separator (applied automatically by MS Excel in some locales) which is not supported. To work with this data, replace the semicolons with either commas, tabs or spaces using a text editor. Alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data before then saving it as a CSV format file explicitly selecting the Comma Delimited version, if it is available. The file also has some missing values at the bottom and these should be replaced with a missing value code. The default in SHAZAM is -99999. After replacing with commas, commas and adding the missing value code, the file then becomes: testdata1.csv Here is a simple script to then produce the vector from the comma delimited (separated) file: * Read the data into a matrix skipping the names on the first row. * Note: It is required to specify the number of columns. read(testdata1.csv) data mydata / skiplines=1 cols=13 matrix mymat = data mydata * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  It is also possible to read the data as variables and use the matrix command to concatenate them. Doing it this way means there is no need to specify the number of columns so it can sometimes be preferable. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  When doing it this way for many similar files a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. This string can be substituted into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM may be represented as either variables or matrices (or vectors). It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes it is preferable to use matrix data as a variable, particularly when you want to pull out a column of a matrix to a variable and don't wish to keep specifying the column index. In the above, to convert the vector to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print it in a single column use print myvec / byvar  When there is a number of missing values and son't wish to see the warnings these can be disabled using this command. set nowarnskip nowarnmiss It is a good idea not to do this until you are certain the data has been read in correctly. To reenable them further down the script use: set warnskip warnmiss The file provided is known as a delimited (separated) Free Format file and comma separation is the most common delimiter used these days. Data can also be read as Fixed Format (the data layout is fixed) where variables are located in precise positions (columns) within a file. To read these in SHAZAM insert a FORMAT statement (see the chapter Data Input and Output) before the READ statement and add the FORMAT option to the READ statement. Software like SHAZAM can read Fixed and Free format data faster than any other structure because there is very little overhead in extracting the data compared to say reading a spreadsheet, XML or Recordset from a DBMS - where either a software layer like a driver or a complex parsing code is required to interpret the file. SHAZAM can also read those using the Data Connector and query those using SQL. SHAZAM reads text data easily and it is not necessary to explicitly specify the separator (delimiter) if it is a space, comma or tab. However, the file above uses a semicolon separator (applied automatically by MS Excel in some locales) which is not supported. To work with this data, replace the semicolons with either commas, tabs or spaces using a text editor. Alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data before then saving it as a CSV format file explicitly selecting the Comma Delimited version, if it is available. The file also has some missing values at the bottom and these should be replaced with a missing value code. The default in SHAZAM is -99999. After replacing with commas and adding the missing value code, the file then becomes: testdata1.csv Here is a simple script to then produce the vector from the comma delimited (separated) file: * Read the data into a matrix skipping the names on the first row. * Note: It is required to specify the number of columns. read(testdata1.csv) mydata / skiplines=1 cols=13 matrix mymat = mydata * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  It is also possible to read the data as variables and use the matrix command to concatenate them. Doing it this way means there is no need to specify the number of columns so it can sometimes be preferable. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  When doing it this way for many similar files a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. This string can be substituted into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM may be represented as either variables or matrices (or vectors). It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes it is preferable to use matrix data as a variable, particularly when you want to pull out a column of a matrix to a variable and don't wish to keep specifying the column index. In the above, to convert the vector to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print it in a single column use print myvec / byvar  When there is a number of To disable warnings about missing values and son't wish to see the warnings these can be disabled using use this command. set nowarnskip nowarnmissnowarnmiss  It is a good idea wise not to do this until you are certain the data has been read in correctly. To reenable them further down the script use: set warnskip warnmisswarnmiss  The file provided is known as a delimited (separated) Free Format file and comma separation is the most common delimiter used these days. Data can also be read as Fixed Format (the data layout is fixed) where variables are located in precise positions (columns) within a file. To read these in SHAZAM insert a FORMAT statement (see the chapter Data Input and Output) before the READ statement and add the FORMAT option to the READ statement. Software like SHAZAM can read Fixed and Free format data faster than any other structure because there is very little overhead in extracting the data compared to say reading a spreadsheet, XML or Recordset from a DBMS - where either a software layer like a driver or a complex parsing code is required to interpret the file. SHAZAM can also read those using the Data Connector and query those using SQL. SHAZAM reads text data easily and it is not necessary to explicitly specify the separator (delimiter) if it the delimiter(separator) is a space, comma or tab. However, the file above uses a semicolon separator (applied automatically by MS Excel in some locales) which is not supported. To work with this data, replace the semicolons with either commas, tabs or spaces using a text editor. Alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data before then saving it as a CSV format file explicitly selecting the Comma Delimited version, if it is available. The file also has some missing values at the bottom and these should be replaced with a missing value code. The default in SHAZAM is -99999. After replacing with commas and adding the missing value code, the file then becomes: testdata1.csv Here is a simple script to then produce the vector from the comma delimited (separated) file: * Read the data into a matrix skipping the names on the first row. * Note: It is required to specify the number of columns. read(testdata1.csv) mydata / skiplines=1 cols=13 matrix mymat = mydata * Calculate the total number of rows to be in the new vector gen1 length =$rows * $cols * Set the sample size for the new vector using this length sample 1 length * Create the vector from the matrix columns with the matrix command matrix myvec = vec(mymat)  It is also possible to read the data as variables and use the matrix command to concatenate them. Doing it this way means there is no need to specify the number of columns so it can sometimes be preferable. Here is how it would be done with this dataset. * Read the data using the first row as variable names and list the data read (testdata.shd) / names list *Create a matrix by concatenating the variables ('|' does matrix concatenation) matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC  When doing it this way for many similar files a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. This string can be substituted into a subsequent statement using the format [<alias>]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed: * Using a character string for months months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC matrix mymat = [months]  Data in SHAZAM may be represented as either variables or matrices (or vectors). It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes it is preferable to use matrix data as a variable, particularly when you want to pull out a column of a matrix to a variable and don't wish to keep specifying the column index. In the above, to convert the vector to a data variable simply do: * Convert a vector to a variable genr myvar = myvec  To convert it back again: * Convert a variable to a vector matrix myvec = myvar  Finally vectors can be easily viewed using the print command print myvec  or to print it in a single column use print myvec / byvar  To disable warnings about missing values use this command. set nowarnmiss  It is wise not to do this until you are certain the data has been read in correctly. To reenable them further down the script use: set warnmiss  The file provided is known as a delimited (separated) Free Format file and comma separation is the most common delimiter used these days. Data can also be read as Fixed Format (the data layout is fixed) where variables are located in precise positions (columns) within a file. To read these in SHAZAM insert a FORMAT statement (see the chapter Data Input and Output) before the READ statement and add the FORMAT option to the READ statement. Software like SHAZAM can read Fixed and Free format data faster than any other structure because there is very little overhead in extracting the data compared to say reading a spreadsheet, XML or Recordset from a DBMS - where either a software layer like a driver or a complex parsing code is required to interpret the file. SHAZAM can also read those using the Data Connector and query those using SQL. SHAZAM reads text data easily if the delimiter(separator) is a space, comma or tab. However, the file above uses a semicolon separator (applied automatically by MS Excel in some locales) which is not supported. To work with this data, replace the semicolons with either commas, tabs or spaces using a text editor. Alternatively the 'Text to Columns' feature in MS Excel can be useful for splitting data before then saving it as a CSV format file explicitly selecting the Comma Delimited version, if it is available. The file also has some missing values at the bottom and these should be replaced with a missing value code. The default in SHAZAM is -99999. After replacing with commas and adding the missing value code, the file then becomes: testdata1.csv Here is a simple script to then produce the vector from the comma delimited (separated) file: * Read the data into a matrix skipping the names on the first row. * Note: It is required to specify the number of columns. read(testdata1.csv) mydata / skiplines=1 cols=13 matrix mymat = mydata * Calculate the total number of rows to be in the new vector gen1 length =$rows * \$cols

* Set the sample size for the new vector using this length
sample 1 length

* Create the vector from the matrix columns with the matrix command
matrix myvec = vec(mymat)


It is also possible to read the data as variables and use the matrix command to concatenate them. Doing it this way means there is no need to specify the number of columns so it can sometimes be preferable. Here is how it would be done with this dataset.

* Read the data using the first row as variable names and list the data

*Create * Create a matrix by concatenating the variables ('|' does matrix concatenation)
matrix mymat = JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC


When doing it this way for many similar files a useful trick is to create a SHAZAM Character String (akin to an alias) for the data variable concatenation. This string can be substituted into a subsequent statement using the format [<alias>]. [alias]. For example the immediately preceding command statement may be substituted for the two below and the alias months can be inserted wherever it is needed:

* Using a character string for months
months:JAN|FEBR|MARCH|APRIL|MAY|JUNE|JULY|AUG|SEPT|OCTOB|NOV|DEC
matrix mymat = [months]


Data in SHAZAM may be represented as either variables or matrices (or vectors). It is easy to interchange between the two and SHAZAM techniques can accept both. Sometimes it is preferable to use matrix data as a variable, particularly when you want to pull out a column of a matrix to a variable and don't wish to keep specifying the column index. In the above, to convert the vector to a data variable simply do:

* Convert a vector to a variable
genr myvar = myvec


To convert it back again:

* Convert a variable to a vector
matrix myvec = myvar


Finally vectors can be easily viewed using the print command

print myvec


or to print it in a single column use

print myvec / byvar


To disable warnings about missing values use this command.

set nowarnmiss


It is wise not to do this until you are certain the data has been read in correctly. To reenable re-enable them further down the script use:

set warnmiss


The file provided is known as a delimited (separated) Free Format file and comma separation is the most common delimiter used these days. Data can also be read as Fixed Format (the data layout is fixed) where variables are located in precise positions (columns) within a file. To read these in SHAZAM insert a FORMAT statement (see the chapter Data Input and Output) Output) before the READ statement and add the FORMAT option to the READ statement.

Software like SHAZAM can read Fixed and Free format data faster than any other structure because there is very little overhead in extracting the data compared to say reading a spreadsheet, XML or Recordset from a DBMS - where either a software layer like a driver or a complex parsing code is required to interpret the file. However SHAZAM can also read those formats using the Data Connector and even query those using SQL.

Structured Query Language (SQL) to extract and transform data prior to analysis.