Software formatting Australian PCEVN files     07-Dec-2004 /amn
==========================================
(in file: aus-1:/home/pb/max/RunSf-instructions.txt)

We have two work space areas on PC aus-1, the internal (currently four
200GB Maxtor SATA disks) raid0 mounted (automatically via /etc/fstab)
on '/mnt/md2' and the swappable 2--4 PATA disk raid0, mounted manually
after disk swaps on /i1.  Both of these raid0s should have "station
directories" named after two-letter station codes, owned by the
regular user 'pb' so that most of the steps below need not be done as
the 'root' user.

 - internal SATA raid0:  /mnt/md2/Xx, place for copies of originals
 - swappable PATA raid0: /i1/Xx, place for files going to Mk5

IDE DMA
-------
To ensure reliable disk model detection at BIOS level during BIOS
initialization it is important to disable the "IDE DMA transfer
access" in BIOS setup.  (This is among the "Integrated Peripherals"
settings submenu "IDE Function Setup".  On 05-Dec-2004 I set it to
'Disabled' on both aus-1 and aus-2.)

Long IDE cables (even high-quality ones) and IDE disk swap trays do
not always work at the highest Ultra DMA speeds---the "hda" cage of
aus-1 seems especially prone to this problem.  By disabling "IDE DMA
Mode" in BIOS (and by having 'ide=nodma' LILO "append=" option in
/etc/lilo.conf when running LILO to install the kernel boot-up, as has
been set in both aus-1 and aus-2) you ensure that the BIOS talks to
the disks (during disk model detection etc.) in slow PIO mode; also
early in the Linux boot-up the check for raid0 persistent superblocks
is performed safely in slow PIO modes.  The full-speed DMA mode gets
enabled later during Linux boot when '/etc/init.d/hwtools' script is
invoked.

Step 1, Copying Files
---------------------

0) If the aus-1 PC is running, shut it down by logging in as 'root'
   and:

   shutdown -h now 

1) Put the 2--4 Australian PATA disks into swappable slots.  (The
   slots become (starting from top downwards) /dev/hdc, /dev/hdd,
   /dev/hda, /dev/hdb.  Put master disks into top/hdc and #3/hda and
   slave disks into #2/hdd and bottom/hdb, see the labelling on the
   right side of the disk carriers.)  Ensure the key lock has been
   turned clockwise to the "locked" position---if not, the drive in
   that slot will not receive power and the green power LED will not
   be lit.

2) Power on and boot the aus-1.

3) Login as 'root' and verify that all inserted disks have been
   recognized.  This is easiest with the command:

   fdisk -l | less

   which will list all the partitions on all inserted disks.  You
   should see (in addition to the internal disks such as /dev/hde)
   disks '/dev/hd{a,b,c,d}', depending on how many you inserted.

   The command 'mdadm -E' will additionally display information about
   how the disks were used in the original raid0 configuration.

   mdadm -E /dev/hda1

   The table at the end of the output of 'mdadm' will have a line
   starting with the word 'this' and followed by the number (0..n)
   which tells the ordinal number of this disk in the original array.
   Repeat the 'mdadm -E' for each /dev/hdX1 you expect to see and make
   a note of the ordering of the disks.

   You could make a list of the ordering with:

   for name in a b c d; do echo -n "$name: "; mdadm -E /dev/hd${name}1 | grep '^this': done

   I have packaged this command into a script '/root/whatraid' which
   you can invoke after logging in as root:

   ./whatraid

   You should additionally verify that each of the disks can be
   reliably read without UDMA CRC errors by reading a suitable amount
   of raw data from disks.

   dd if=/dev/hda of=/dev/null bs=4k count=100000

   If CRC errors or read problems appear, follow 'hdparm' instructions
   in step 8) below.

4) If the original Australian disk device names were '/dev/hd[abcd]1'
   it can be that the kernel multi-disk auto-assembly system has
   already recognized the raid.  Check this with:

   cat /proc/mdstat

   If you see a line starting with 'md0 : ...' the raid0 has already
   been started and you can skip the following 'mdadm' command.
   Otherwise "assemble" the raid0 array manually with:

   mdadm -A /dev/md0 /dev/hda1 /dev/hdb1 /dev/hdc1 /dev/hdd1

   replacing the disk names /dev/hdX1 in the above in the order which
   you discovered in step 3) when you were checking the output of
   'mdadm -E' examine command.  (The './whatraid/ script prints out a
   suggested 'mdadm -A' command line based on 'mdadm -E' information.)

   Verify that the 'mdadm -A' was successful by checking again that
   you can see a line starting with 'md0 : ...' in the output of 'cat
   /proc/mdstat'.

   We have seen partition number 2 being used instead of number 1.
   (This was the case with e.g. Oct-2004 Ceduna disks.)  In this case
   the assembly command would be similar to:

   mdadm -A /dev/md0 /dev/hda2 /dev/hdb2 /dev/hdc2 /dev/hdd2

   Alternatively, you can edit the file '/etc/raidtab' to list subdisk
   names as '/dev/hdX2' instead of '/dev/hdX1' and let the boot-up
   raid auto-mount sequence discover the raid setup.

5) Mount the original raid0 read-only with:

   mount -o ro /dev/md0 /i1

6) Inspect what is on /i1 with regular 'ls -l' commands and discover
   where the set(s) of contiguous PCEVN files are.

7) Check with 'ls' and 'df --si' that the internal SATA raid0 has been
   mounted on '/mnt/md2' and that it contains the work directory
   'mnt/md2/Xx' and a sufficient amount of free space for the files.
   Create the target "station directory" (please replace 'Xx' with a
   two-letter station code in the following):

   mkdir /mnt/md2/Xx
   chown pb.pb /mnt/md2/Xx

   You may want to create a subdirectory just for the raw data files,
   to help unclutter the '/mnt/md2' area:

   mkdir /mnt/md2/Xx/raw
   chown pb.pb /mnt/md2/Xx/raw

   This makes it easier to delete after processing only the raw data
   files and leave control files and run log files around.

8) Copy the files over, tar is probably the fastest way (approximately
   1h30min per 500GB, or 740Mbps):

   (cd /i1; tar cf - .) | (cd /mnt/md2/Xx/raw; tar xpf -)

   Please note how you can use the parenthesized 'cd' commands to
   determine the source and destination directories.  The source
   directory can be the root of the raid0, or there can be a
   subdirectory, named according to Australian conventions.

   Instead of '.'  as the source you can use 'xyz*' expressions and
   list multiple directories and files.  (Directories listed here will
   be created inside the copy target directory.)  Please note also the
   'p' in the target extraction tar command: it retains the timestamps
   of the files but additionally it creates the files with the original
   ownership (original numeric UIDs).

   If you see error messages about "DMA CRC errors" you may have a
   disk drive which is sensitive to IDE cabling.  (Many Samsung and
   IBM/Hitachi drives are prone to DMA problems, especially when they
   are "slave" drives at the middle of an IDE cable, "hdb/hdd".)

   A typical situation is that you see error messages about hdb, "hdb:
   DMA CRC errors" and "disabling DMA".  If DMA gets disabled, copying
   will slow down 5--10x, so it makes sense to interrupt it with
   ctrl-C and slow down the interface speed of the offending drive.

   See what the drive with DMA CRC errors is doing with:

   hdparm -i /dev/hdb

   There will be a line like the following in the output:
---
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
---
   The asterisk in front of a mode keyword indicates the mode
   currently in use.  'udma6'==133MHz, 133Mbytes/s, 'udma5'==100MHz,
   100Mbytes/s, 'udma4'==66MHz, 'udma3'==??MHz and 'udma2'==33MHz, the
   highest which will work with an old-style 40-pin IDE cable.

   The drive in the example above is running at 'udma5'==100MHz, so
   slow it down to 66MHz with:

   hdparm -c1 -d1 -u1 -m16 -X68 /dev/hdb

   (The lines for permanent settings are in '/etc/init.d/hwtools' and
   you are now manually overriding those for the duration of this
   single boot.)

   The parameter '-X86' in the above is formed by summing the number
   of the UDMA mode to 64; you want udma4, so 4+64==68, i.e. you need
   to use '-X68'.

   After slowing down the offending drive you usually have to
   re-enable DMA for its "partner" drive sharing the same IDE cable.
   E.g. when /dev/hdb generates errors the DMA usually gets disabled
   for /dev/hda.  Just invoke a similar hdparm command to the "other"
   drive sharing the same cable, i.e.:

   hdparm -c1 -d1 -u1 -m16 /dev/hda

   Alternatively, you may want to invoke '/etc/init.d/hwtools start'
   to re-initialize all drives to use DMA mode at their highest
   default speed and then just slow down the offending drive with e.g.:

   hdparm -X68 /dev/hdb

   After fixing the speed problem for a pair of drives
   it might be easiest just to delete the files already copied into
   '/mnt/md2/Xx' and start the "tar" step above in 8) just again.

9) You will probably have to fix the ownership of the resulting files
   with:

   chown pb.pb /mnt/md2/Xx/raw/*
   (chmod a+r /mnt/md2/Xx/raw/*)

   (This does not affect the timestamps.)

10) When you are happy with the resulting file set(s) ('ls -l
    /mnt/md2/Xx/raw') unmount and shutdown:

    df --si
    umount /i1
    umount /mnt/md2
    shutdown -h now

Depending on the available disk space on '/mnt/md2' in the above 'df'
you may want to change disks and repeat this step for another set of
files.


Step 2, Software Formatting
---------------------------

1) Put the 4-disk ("250GB Maxtor") workspace raid0 PATA disks in
   and power on.

2) The 4-disk raid0 should have been automatically recognized via
   /etc/raidtab file, login as 'root' and check this with:

   cat /proc/mdstat

   If not, you can use 'mdadm -A' as in Step 1 above to manually fix
   the situation---but please ensure that the four disks really are
   the 4 work PATA disks before doing so!  (You might still have one
   of the original "Australian" disks in...)  You can also edit
   /etc/raidtab if you find that it is in error and then reboot with
   'shutdown -r now'.

3) As 'root', mount the workspace with:

   mount /dev/md0 /i1

   and check ('ls -l /i1') that you have a suitable working station
   directory created and owned by 'pb.pb'.  (If not, create with
   'mkdir /i1/Xx; chown pb.pb /i1/Xx'.)

4) Switch over to a second virtual console with Alt-F2 and login as
   'pb' and perform further work as a regular user.  You may want to
   'cd /i1/Xx' and perform deletions and other cleanup work there and
   maybe check the availability of disk space with 'df --si'.

5) Create a template for the software formatting script control file
   by taking the first line (containing station UTC start time of that
   file) of a file set.  Use something like the following (increment
   the number for later versions of the file, 'sf2.ctrl' and so on):

   cd /mnt/md2/Xx/raw
   head gg057* > ../sf1.ctrl

   You will get a file with contents like the following:
---
==> gg057a-pks-0000 <==
20040827:135449

==> gg057a-pks-0001 <==
20040827:135459

==> gg057a-pks-0002 <==
20040827:135509
...
---

   You will have to edit this file into the following format:
---
No---- 2004-08-27T13:54:49 gg057a-pks-0000
No---- 2004-08-27T13:54:59 gg057a-pks-0001
No---- 2004-08-27T13:55:09 gg057a-pks-0002
...
---
   Emacs keyboard macros are very handy for this, type 'C-x (' before
   doing the repeated edit for the first time, and when you are done
   with one edit (and have the cursor in the same relative position
   where you had it when you started), close the macro with 'C-x )'
   and try it with 'C-x e'.  If it works correctly, you can repeat the
   macro automatically with 'C-u n n n C-x e' where the 'n n n' are
   digits representing the number of repeats.  (You can specify, say,
   '9999' since when the macro hits the end of buffer it will stop.)

6) Take the schedule VEX file and group the files into VEX scans based
   on the start and end times.  (If the times of VEX and times of
   files do not match exactly, take a little bit more into the
   beginning and and at the end, a few seconds is fine.)  Leave the
   scan name at 'No----' for those files which do not really belong to
   any scan.  For files belonging to a VEX scan, put the real scan
   name in place of 'No----' at the beginning of lines, e.g. 'No0012'.

7) Insert the software formatter parameters in the beginning of the
   control file.  The parameters end with a single line '$SCANS'.
   Typical parameter examples can be found in previously used control
   files at '/mnt/md2/*/sf*.ctrl'.

   A typical Parkes 512Mbps parameter set can be as follows (this
   formats from internal raid0 into external temp raid0):
---
FORMATMODE=mk4   # or vlba
INPUTDIR=/mnt/md2/Pk/raw
##SKIPBYTES=817  # hdr length in bytes +1(!) in each file
SKIPBYTES=17  # for Mopra or Nov-Parkes
TWOBITSAMPLEMAPPING="1230"  # Pk/Mp etc "1230", "0123" would be no mapping
INBITSTREAMS=16  # input sampled bitstreams
INSAMPLERATE=32  # MHz
OUTTRACKS=64  # output tracks, 64==two heads, 32==single hdstk
OUTVARIANT=1  # variant 0, 1,...
OUTPUTDIR=/i1/Pk
---

   A typical ATCA 256Mbps parameter set can be as follows (this
   formats directly from Australian raid0 into the internal raid0):
---
FORMATMODE=mk4   # or vlba
INPUTDIR=/i1/gg057b
##SKIPBYTES=817  # hdr length in bytes +1(!) in each file
SKIPBYTES=17  # for Mopra or Nov-Parkes or Nov-ATCA
TWOBITSAMPLEMAPPING="1230"  # Pk/Mp/At etc "1230", "0123" would be no mapping
INBITSTREAMS=8  # input sampled bitstreams
INSAMPLERATE=32  # MHz
OUTTRACKS=32  # output tracks, 64==two heads, 32==single hdstk
OUTVARIANT=1  # variant 0, 1,...
OUTPUTDIR=/mnt/md2/At
---

8) Go to directory /home/pb/max with:

   cd max

   There is a Bash script called 'RunSf' which will take the control
   file created in the step above and run the software formatter
   executable '~/proj/vsib/sf' with the command line arguments mangled
   from the control file parameters.

   Sample coding mapping string '0123' would mean "1:1, do not map".

   For Aug-2004 Parkes data we used the following "1230" mapping.  (We
   believe this mapping is correct for Nov-2004 Mopra and Parkes data,
   too.)  This is sometimes referred to as "AT coding".

   "&0x3"  Meaning  Bin --> VLBA   "cx"  "cx"
   C code                   tape   hex   sign
   ------------------------------------------
                     ms       ms
     0      Lo+      00     1 01    A     +2
     1      Lo-      01     2 10    6     -2
     2      Hi+      10     3 11    F     +7
     3      Hi-      11     0 00    1     -7

   The Oct-2004 Ceduna data apparently used the so-called "VLBA
   coding", i.e. apparently directly the VLBA tape codes, as follows:

   "&0x3"  Meaning  Bin --> VLBA   "cx"  "cx"
   C code                   tape   hex   sign
   ------------------------------------------
                     ms       ms
     0      Hi-      00     0 00    1     -7
     1      Lo+      01     1 01    A     +2
     2      Lo-      10     2 10    6     -2
     3      Hi+      11     3 11    F     +7

   So for Aug&Nov-Parkes and Nov-Mopra data we have used the "./sf
   sample coding mapping string of "1230" and the "./cxextr sample
   coding mapping string of "a6f1".  For Oct-Ceduna data, ./sf string
   of "0123" and ./cxextr string of "1a6f" should apparently be used.
   Chris Phillips has indicated that the intent is to use "AT coding"
   everywhere, also in Ceduna, and that Cd-Nov-2004 data most probably
   already uses "1230" "AT coding".

   The number of input bit streams, the number of output bit streams,
   and the last "variant" number (0, 1,...) together decide the track
   mapping performed by './sf'.  These alternatives are available:
    - '8 32 32 0' 8 bitstreams 1:4 32 tracks variant 0:
      databyte & 0x01 = ch01 : sign : 1 :  2  4  6  8
      databyte & 0x02 = ch01 :  mag : 1 : 10 12 14 16
      databyte & 0x04 = ch02 : sign : 1 :  3  5  7  9
      databyte & 0x08 = ch02 :  mag : 1 : 11 13 15 17
      databyte & 0x10 = ch03 : sign : 1 : 18 20 22 24
      databyte & 0x20 = ch04 :  mag : 1 : 26 28 30 32
      databyte & 0x40 = ch04 : sign : 1 : 19 21 23 25
      databyte & 0x80 = ch04 :  mag : 1 : 27 29 31 33

    - '8 32 32 1' 8 bitstreams 1:4 32 tracks variant 1:
      databyte & 0x01 = ch01 : sign : 1 :  2  4  6  8
      databyte & 0x02 = ch01 :  mag : 1 : 10 12 14 16
      databyte & 0x04 = ch02 : sign : 1 : 18 20 22 24
      databyte & 0x08 = ch02 :  mag : 1 : 26 28 30 32
      databyte & 0x10 = ch03 : sign : 1 :  3  5  7  9
      databyte & 0x20 = ch03 :  mag : 1 : 11 13 15 17
      databyte & 0x40 = ch04 : sign : 1 : 19 21 23 25
      databyte & 0x80 = ch04 :  mag : 1 : 27 29 31 33

    - '16 32 64 0' 16 bitstreams 1:4 64 tracks variant 0:
      duplicate '8 32 32 0' variant above 2x, for head 1 and 2
      dataword & 0x0001 = ch01 : sign : 1 :  2  4  6  8
      ...
      dataword & 0x0080 = ch04 :  mag : 1 : 27 29 31 33
      dataword & 0x0100 = ch05 : sign : 2 :  2  4  6  8
      ...
      dataword & 0x8000 = ch08 :  mag : 2 : 27 29 31 33

    - '16 32 64 1' 16 bitstreams 1:4 64 tracks variant 1:
      duplicate '8 32 32 1' variant above 2x, for head 1 and 2
      dataword & 0x0001 = ch01 : sign : 1 :  2  4  6  8
      ...
      dataword & 0x0080 = ch04 :  mag : 1 : 27 29 31 33
      dataword & 0x0100 = ch05 : sign : 2 :  2  4  6  8
      ...
      dataword & 0x8000 = ch08 :  mag : 2 : 27 29 31 33

    - '16 32 32 0' 16 bitstreams 1:2 32 tracks variant 0:

   The '8 32 32 1' variant (8 bitstreams 1:4 32 tracks variant 1) was
   used successfully to correlate Aug/Sep-2004 Parkes data.  The '16
   32 64 1' variant (16 bitstreams 1:4 64 tracks variant 1) is being
   investigated for Oct-Ceduna and Nov-ATCA data.

   The script is run with the control file created in above steps as
   its first command line argument:

   ./RunSf /mnt/md2/Xx/sf1.ctrl

   (You can time the execution of this (and any other command line) by
   adding '/usr/bin/time' in front of it, e.g. '/usr/bin/time ./RunSf
   /mnt/md2/Xx/sf1.ctrl'.  Typically 512Mbps 16->64 formatting 500GB
   of data takes about 1h20min, that is, 833Mbps.)

   When run like this, the script extracts (and sorts in alphabetical
   order) all the different scan names in the control file (except it
   omits the scan named 'No----').  For each scan name it then
   re-extracts the data file names (in control file order),
   concatenates the files (removing possible headers in each file),
   and feeds the resulting stream into /home/pb/proj/vsib/sf formatter
   executable with the parameters set by the RunSf script.

   If you need to reformat a single scan, you can enter the scan name
   (or multiple scan names within single quotes, separated by blanks)
   as a second argument:

   ./RunSf /mnt/md2/sf1.ctrl No0011

9) The RunSf script creates the formatted output files, one per scan
   in the directory specified in control file parameters.


Step 3, Stuffing the Files into a Mark 5 Disk Pack
--------------------------------------------------

1) Login as 'pb' on aus-1.  Go to directory /home/pb/max/m5tk with:

   cd max/m5tk

   This is the "home" of the Haystack 'm5cmd.pl' eVLBI Perl script.

2) Prepare the list of scan files to be sent to a Mk5.  These are
   probably those just created in the directory '/i1/Xx':

   ls -1 /i1/Xx/No* > /i1/Xx/scans1

   The resulting file should look like:
---
/i1/Xx/No0007.mk4
/i1/Xx/No0008.mk4
/i1/Xx/No0009.mk4
/i1/Xx/No0010.mk4
/i1/Xx/No0011.vlba
/i1/Xx/No0013.mk4
/i1/Xx/No0014.mk4
---
   Be careful of not leaving empty lines in or at end of the file, the
   scripts seem to attempt to transfer empty-named files creating
   empty scans in Mk5.

3) Prepare the enhanced "1Gbps-Mk5" unit with the disk pack which is
   going to have these scans appended.  You can 'ssh' directly to this
   Mk5 unit as follows:

   ssh oper@192.168.150.4
   (password asked a few seconds later)
   DirList

   Shows you the currently stored scans on the disk pack mounted and
   active in that Mk5.  If you need to for instance erase the contents
   of the pack, use 'tstMark5A' as follows:

   tstMark5A
   > protect=off
   > reset=erase
   ctrl-C   (to get out of 'tstMark5A', not absolutely necessary since
   the following 'm5cmd.pl' will work independently of 'tstMark5A'.

4) Start the transfer with:

   ./m5cmd.pl -dst 192.168.150.4 -scans=/i1/Xx/scans1
     -mode=h2m -file2net_exec=./File2net

   (put this on one line).  Transferring to "super-Mk5s" seems to take
   approximately 2 hours per 500GB or about 555Mbps.

   The scripts seem to be highly experimental and print out error
   messages like "Unknown response" etc.  If you get new script
   versions from Haystack / David Lapsley, _please_ check that the
   file 'M5.pm' does _not_ contain any 'rm' delete commands!  (The
   "stock" release deletes all files it has attempted to transfer to a
   Mk5, regardless whether the transfer was successful or not...)
   Additionally, the main script file 'm5cmd.pl' had the following
   typo:

-----
diff m5cmd.pl.orig m5cmd.pl
74c74
< 	my($prefix, $file2net_exec, $protocol)=("", "", "");
---
> 	my($prefix, $file2net_exec, $protocol)=("", "", "tcp");
-----