Saturday, December 15, 2012

data corruption: PacIOOS CTD

Gordon Walker (PacIOOS, University of Hawai`i), Mike Shoemaker (AOML Electronics Tech) and myself, Mike Jankulak (AOML, University of Miami) are working together to access the data stored on the PacIOOS CTD.

Gordon purchased a replacement Y-Cable from SeaBird which arrived in Honolulu on December 6th.  He then shipped it to Shoe, who connected it to the recovered CTD so that I could communicate with it.

Communications are possible but the memory appears to be corrupted.  Attempts to download the stored data have so far yielded nothing but garbage data past September 6th, 2011.  This point is just more than a year earlier than the sensor's recovery so we are hoping that more data can be recovered.  Gordon will be consulting with his SeaBird contacts about this issue.

Mike J+

Friday, November 16, 2012

awaiting Y-cable from SeaBird

A brief update:  Gordon Walker (PacIOOS, University of Hawai`i) has placed an order with SeaBird for a replacement Y-Cable that we will be able to use here in Miami for connecting to the PacIOOS CTD and downloading the data from its on-board memory.

Mike J+

Wednesday, October 10, 2012

diagnosis: a dissolved connector pin

Mike Shoemaker makes this report about the likely cause of the station's power problems:
These are the preliminary results of the examination of returned instruments from Saipan. The attached pictures are from the PAC IOOS CT main power Y-cable which provides power to the instrument and the attached water pump along with communication. The connector that attaches to the fish bite cable adapter has had a pin arced off. This pin is still embedded in the adapter. I speculate that this may have been caused by bringing the instrument up to the surface to disconnect from the station and connect to a laptop for download then being re connected without properly drying the connectors and lubing with silicon to form a proper seal. In addition to these problems the strain relief connector is missing from the main fish bite cable, this also could be the cause of the seawater intrusion as that the main fish bite cable megs out @ infinity but when the adapter is attached it megs out @ 1.5meg.

The BIC _DP cable megs out @ anywhere from 400meg to 600meg depending on which of the 4 conductors that you attach to and the CTD_DP cable megs out @ 200meg across all 4 conductors. I speculate that from the earlier e-mail from Mr. Benavente about the three prong spears used by local fisherman that the cables were punctured from these and that seawater intrusion has caused these high resistance short circuits.

My next step will be to see if Mr. Bishop has any of the Y-cables left from the Sea Bird CTD's that we transferred to his group to test the PAC IOOS CT and down load Gordon Walker's data along with testing the other returned underwater instruments. That is unless Gordon can expedite a shipment of a cable from his spares.
As Shoe explains, we now believe with a high degree of certainty that the station power problems were caused by a connector failure.  The ground pin from the PacIOOS Y-Cable is missing in the connector that attaches to the CREWS pigtail adapter.  Both this Y-Cable (a SeaBird part) and the pigtail adapter (custom-made by CREWS) must be replaced.

Next steps will include downloading the PacIOOS on-board memory and re-testing of the "brain" components.  The underwater sensors that have been retrieved, since they were deployed on the station for more than a year, will now be sent back to their manufacturers for evaluation and recalibration, and the station will next go live with its alternative set of sensors (most of which have been in storage in Saipan since August of 2011).

Clicking on any of the images below should load a full-sized version.

Mike J+



Tuesday, October 2, 2012

station equipment reaches Miami

Following the last blog update on September 5th (which mentioned how the station had been completely offline since August 24th), the station resumed its brief sunlit periods of activity for a few hours a day until September 19th.  The last coherent report from the PacIOOS CTD, however, occurred on September 10th.

I'm not sure of the details of timing and personnel, but I believe David Benavente and Steven Johnson were involved in an operation on September 27th, 2012, to retrieve all of the station underwater sensors and cables as well as the "brain" package.  These, along with the "groundtruth" CT, were packed up and shipped to AOML in Miami on September 28th.  The shipment arrived here on October 1st, 2012.

David Benavente had made the following observation about removing the instruments from the station:
So just thought you should know. While I was attempting to shut down the PACIOOS Seabird I removed the male/female plugs to connect it to the serial port. I noticed that one of the copper nodes on the male side had corroded off. Im not sure if its the ground wire, but I thought I should let you know. I decided to remove the batteries from the PACIOOS CTD so that it cannot continue to collect data.
Mike Shoemaker will be leading the effort to diagnose the power losses that have plagued the station in the past few months.

Mike J+

Thursday, September 6, 2012

new plan: equipment recovery for analysis in Miami

As of today, September 5th, 2012, the decision has been made to recover all of the station equipment and ship it to Miami for analysis and repair.  An alternative plan to have Mike Shoemaker fly out to Saipan and work on site with Steven and David has been set aside due to budgeting and logistics difficulties.

After the last blog update, the station continued to come alive for a few hours a day until August 24th, at which time it went entirely offline and remains offline as of the writing of this update.

Mike J+

Wednesday, August 15, 2012

info from brief resumptions of communications

In the last few days, from August 10th - 13th, there have been four brief periods when communications with the Saipan station have resumed.  This post explains what we have learned from the existence and the contents of those communications.

First let's look at the timing and duration of these periods, speaking in local Saipan time:
  • Friday, August 10th, 7:18 am - 7:42 am (5 records)
  • Saturday, August 11th, 1:30 pm - 4:48 pm (34 records)
  • Sunday, August 12th, 11:24 am - 11:30 am (2 records)
  • Monday, August 13th, 9:42 am - 1:54 pm (33 records)
 The record counts I've listed are from the 6-minute data table.

First of all, note that every record in a data table is given a sequential record number.  This makes it possible to identify cases where records are (or are not) consecutive.  Because of this we can say with high probability that the station's datalogger was completely non-functional from the period from July 19th to August 9th (with times in UTC).  There is no chance that this could be yet another communications-only failure, with little impact otherwise to station operations.

The pattern of communications is somewhat consistent with the explanation that, following a severe power loss that to some extent drained the station's rechargeable batteries, the station may be recharging itself and may briefly resume operations during daylight hours.  However there is still no explanation for the initial power loss, or for the previous power drops (May 19th - June 4th, June 26th - July 5th) seen in the data record.

Perhaps worse, this pattern suggests that the intermittent voltage drop may not be over.  Normally if a short-circuit is repaired we would expect to see better and longer "ontimes" with each passing day.  Instead we start with a brief uptime in the early morning (followed by silence during that day's prime daylight hours).  We also see a stronger performance Saturday followed by a weaker performance Sunday.  Monday's communications are longer but they describe power levels that are getting lower throughout the morning and early afternoon.  And on Tuesday, which has already ended in Saipan, there was no resumption of communications at all.

Another problem is that many of the station's instruments are now malfunctioning.  This may be due to electrical problems (an artifact of running them at very low levels) or their calibration files and settings may have become corrupted, leading them to produce reports in a format that the datalogger program was not designed to parse.  A brief rundown of the state of station instruments follows:

The standalone air temperature sensor, barometer, anemometer and electronic compass all appear to be working normally.  These are all analog instruments and do not depend on serial communications in any way.

Both light sensors are damaged.  The surface light sensor continues to produce serial reports of some kind (as evidenced by that sensor's instrument "counts" in the logger) but these reports are apparently filled only with zeroes, day or night, including for temperature and voltage.  The underwater light sensor has been offline since last October and this has not changed.

The Deep CTD (Teledyne) seems to be at least partially operational.  Its conductivity and temperature readings seem reasonable, although its depth reading show an odd 30-cm shift in the past few days and may indicate a problem.

The PacIOOS CTD was producing data reports on Friday and Saturday but since that time its output has been in a format that was unrecognized by the datalogger programming.  It produced a full "status" report (in response to hourly prompting by the datalogger) on August 11th, 2012, at 4:02 UTC.  This report's contents were as follows:
  • Year: 2012
  • Month: 8
  • Day: 11
  • Hour: 4
  • Minute: 2
  • Second: 56
  • Serial Num: 1606481
  • Num Events: 15
  • Volts Main: 7.3
  • Volts Lith: 8
  • Curr Main: 61.3
  • Curr Pump: 283.4
  • Curr Ext: 286.8
  • Mem Bytes: 1082107
  • Samples: 56953
  • Sample Free: 3406107
  • Sample Len: 19
  • Headers: 4
The Vaisala WXT, like the surface light sensor, is apparently producing reports but as recorded by the datalogger these reports are all zeroes.

It is possible that there has been damage to the datalogger, memory unit, serial port units, radio or cellular modem, although so far there is no sign of such damage.

The main problem at this point is not the instrument failures but that of isolating the cause of these voltage drops.  It seems like there have been voltage drops since mid-May, that a severe voltage drop has rendered the station non-functional for three weeks, and that this may be an intermittent problem that is still ongoing.  If we knew or suspected the cause of the power issues then it would simply be a matter of replacing the station instruments with the spares in storage at CRM.  However our first focus must be on diagnosing the underlying cause of the power losses and then repairing it.

Mike J+

Tuesday, August 7, 2012

Maintenance log: Launched CRM vessel from shore

[From David Benavente's LAOLAO Bay ICON Station Maintenance log -- August 6, 2012:]

Launched CRM vessel from shore. Station was climbed by David B. Upon opening the brain housing all wires and connection plugs were first inspected. No sign of damage was found. Next, Brain hardware was visually inspected. Data logger lights, modem lights, and power supply lights were all illuminated. The inspection did not reveal any wiring problems to the brain.
Underwater cleaning and maintenance of station instruments was also conducted, during this visit.


Tuesday, July 24, 2012

Station Power Failure

The Saipan CREWS station went offline as of 22:36:35 UTC on Thursday, July 19th, 2012.  In local Saipan time this is Friday morning, July 20th, at 8:36 am.  Miami time, it is Thursday night, July 19th, at 6:36 pm.

Gordon Walker (who is taking Ross Timmerman's position with PacIOOS) sent out this information on Monday, July 23rd:
We were able to connect to the modem at 9:30 am (locate Saipan time) and everything looked okay except for the voltage.  It is at 5.76 V.  It seems as though the power for the entire array is extremely low.  This may be the reason for the interruption.  We will continue to monitor the modem on our end.
Unlike previous outages, which were believed to be primarily caused by modem or cellular network problems, this latest outage is clearly due to a loss of power on the station.  Consider the graph of minimum (blue), average (red) and maximum (green) hourly datalogger voltages, plotted against Julian Day, for all of 2012, at right (click on the image for a larger version).

A CREWS station's normal operating voltage cycles between 12.5 V and 14 V, with peak voltages during the sunlight hours when the solar panels are supplying power and lowest voltages overnight when they are not.

A close examination of this station's power levels in 2012 shows three periods where the station did not appear to be reaching its usual daylight voltage peaks.  The first occurred from roughly May 19th to June 4th; the second from June 26th to July 5th; and the third began on July 12th and ended with the station's loss of power.

That third drop, where the station power level fell below 10 V before communications with the station entirely ceased, has happened before in CREWS history, though it is rare.  It happened once in Jamaica, in June/July 2008, when that station's "deep" light sensor's bulkhead connector was broken (we think this was caused by a period of strong currents and the sensor's cable being insufficiently tied down).  In that case the failed sensor was removed after two days, and the station resumed communications about 14 days later when its batteries were again sufficiently charged.  More or less the same thing happened in Puerto Rico, twice, once in April of 2010 and once in July of 2011.  Both of these latter outages were caused by flooded light sensors, at least one of which was caused by a puncture in the sensing surface of the instrument.

Before I continue, I'll just emphasize that what happened in Saipan on July 19th is nothing like the outages that this station has experienced before -- the first failure on October 2nd, or the more recent outages (April 12th - 19th, those few days in early May, and May 12th - 29th).  Those were quite clearly caused by problems with the cellular modem or the Docomo network.  They were communications outages, only.  After communications resumed, the data record showed that the station continued to operate (and store data locally) after we lost contact.  [In the October incident, the station continued to operate only for two more days, when a blown fuse up top took it out completely.]  In this case, to be clear, we believe all station functions to be non-operational, with the possible exception of the battery-powered PacIOOS CTD.

However, this current outage is somewhat different from the power losses we've seen before at Jamaica and Puerto Rico, for two reasons.  In the current case, there is a strange history of lower voltage levels for days or weeks at a time, which were then followed (the first two times) by a return to normal power levels.  This current problem, it seems, is somehow intermittent, which does not lend itself to explanation by something irreversible like a flooded instrument.

The second difference here is that those previous incidents left enough clues for us to determine with high probability what had failed.  Specifically, in the previous cases there was a measured voltage drop in one instrument only, which was then followed (in a matter of days, or hours) by the station's complete power loss.  This current Saipan power loss does not include any such hints in the power levels of its instruments.  This may be because one of those instrument (the underwater light sensor, as it happens) is already incommunicado.  Perhaps that light sensor has a loose wire in its communications but continues to take a full power/ground feed from the station.  If so, it could flood without warning and cause the failure of the entire station as happened in Jamaica and Puerto Rico.

However, those previous periods of voltage loss and recovery remain unexplained, and this suggests that a flooded instrument may not turn out to be the cause in this case.

Mike J+

Friday, July 6, 2012

Maintenance log: clean SeaBird lens cleaner

[From David Benavente's LAOLAO Bay ICON Station Maintenance log -- July 5, 2012:]

Objective was to revisit the station and thoroughly clean SeaBird lens cleaner. David B had a scheduling conflict he joined the group later. On this visit Copper screens were replaced. They appeared to be punctured; the puncture marks looked as if they had come from a three prong spear, commonly used by fishermen.

Wednesday, June 27, 2012

Maintenance log: Accessed Station through cliff

[From David Benavente's LAOLAO Bay ICON Station Maintenance log -- June 26, 2012:]

Accessed Station through cliff. David B. used SCUBA, while Rod C., John I., and Steven J. snorkeled. Found some evidence that fishermen were in close vicinity to the station. Also a knife was found below station, this could be because the base of the station appears to have become a congregation area for Trochus mussels. Because these are a harvested species fishermen may be visiting the station more often.

Saturday, June 23, 2012

Docomo cellular connection spotty, error-prone

Following the recent modem outage (May 12th - 29th), I attempted my usual practice of a gradual re-enabling of the data table downloads.

A note about data tables:  the datalogger has six data tables, each with its own collection schedule.  One very simple table stores one record per day, with only a few data values that are used to monitor the memory card's diagnostics and confirm that it is formatted/storing correctly.  At the other end of the spectrum, one data table stores a new record every five seconds, to record each individual reading from the analog sensors (air temperature and barometric pressure).

Historically most of the CREWS data record has come from the one-hour data table, since most CREWS stations have relied on once-hourly, twenty-second windows of communication on the GOES East satellite.  Data were summarized as necessary and transmitted in a format designed to fit comfortably in our communications window.

With the advent of larger-capacity memory cards, we began to store more granular records in the datalogger's memory for later retrieval.  With our "always on" connection to the Saipan CREWS station (apart from the cellular network problems, that is), we began to access all of those data in real time.

As it might be expected, the 5-second data table contains a very large number of records, although each record stores relatively few data values.  There is also a 30-second, a 1-minute, a 6-minute, a 1-hour and (as mentioned before) a 1-day data table.

Since our data feeds require only the 1-hour and 6-minute data tables for processing, I generally disable the download of all other data tables during long modem outages, in order that when service resumes our feeds can come up as quickly as possible.  Later I will manually re-enable download of the larger data tables and monitor their progress to ensure that our main data feeds aren't negatively impacted.  This was more or less how things worked following the April 12th - 19th outage earlier this year, with all data tables recovered by April 24th.

However, after the May 29th re-establishment of communications with the modem, two differences were observed.  For one, the LoggerNet monitoring software was reporting a large number of error conditions and failed connections, although it did not seem to be impacting the download of the 1-hour or 6-minute data tables.  What was worse, however, is that attempts to re-enable the download of the other data tables appeared to be knocking the modem entirely offline, requiring a request to Docomo for modem/network reset.

The first happened on the morning of Wednesday, May 30th, 2012.  I re-enabled the first of the data tables for download, and all communications ceased.  I alerted Ross and he replied later that same day:
I sent another email to Luigi [at Docomo] this morning to reset the modem. I will take down the modem settings and have Sierra Wireless take a look. It isn't clear whether the Docomo network or the device is the cause of these outages. No reports of network issues have been received, but as David mentioned, they can be frequent. I don't quite understand how we had good service for the initial months and not now.
This apparently got a response from Docomo immediately and the modem was back online again the next morning.  However, it once again went offline when I attempted to re-enable download of the other data tables.  From my message to Ross on May 31st:
You might want to check the modem again today!  It seems like it might be locking up whenever I try to pull data files from the logger.  I don't know why it would suddenly be reacting this way because this was S.O.P. last year (August/Sept) and during the uptimes earlier this year (March 19th through May 12th). And most notably on April 24th, when I did some very intensive data transfers from the modem and everything seemed fine.
This message may have caught Ross away from his email, for his reply arrived on June 4th (wherein he said he would contact Docomo) with an update on June 5th (to report that the modem was online again).

Following this clear pattern of outages, I did not attempt to access the other data tables right away.  Instead, I allowed the main 1-hour and 6-minute data tables to run normally and populate our data feeds.  However, several weeks later I glanced at the LoggerNet status readouts and noticed that the pattern of error conditions and failed connection for the Saipan modem appeared to have resolved itself.  All communications appeared to be normal (for example, in comparison with our other cellular modem which is deployed at Port Everglades near Fort Lauderdale, FL).

When I noticed this change, I re-enabled download of the other data tables and this time, they all downloaded beautifully.  This caught us up back to the beginning of the May 12th outage for the first time, and we once again were keeping current with all data tables in real time.  I sent out this by email on Friday, June 22nd:
LoggerNet's error rate on the docomo modem suddenly relaxed this past week and I took a chance and re-enabled the download of those more granular files I'd mentioned.  That worked beautifully, so we're all caught up with the missing data back to May 12th, and all files are once again downloading in realtime. I don't know what was causing the modem to cycle offline before but it seems to be okay for now.
Mike J+

Friday, June 1, 2012

Maintenance log: May maintenance

[From David Benavente's LAOLAO Bay ICON Station Maintenance log -- May, 2012:]

I was away on travel for most of May. To my understanding the MMT performed regular maintenance on the ICON station during this time. Steven J. reported that the station and its instruments were cleaned.

Thursday, May 31, 2012

outage for more than two weeks, Docomo unresponsive

[On May 12th, 2012, the cellular modem went offline again.  This is an email I sent out on Monday, May 14th:]
This morning I noticed that it's offline again.  The modem's been unreachable since Saturday afternoon UTC (that's Saturday night Saipan time, Saturday morning Miami time, and early in the morning Saturday Honolulu time).
[This time, Ross had a lot of difficulty getting the attention of the Docomo people.  This is from his email on May 14th:]
I did in fact notice the outage. I just sent a note to Docomo to check.
[And then on May 21st:]
The modem is still unreachable and I have received no response from Docomo from my 5/14 email. I just emailed them again today.
[And finally, success, on Tuesday, May 29th:]
My request to Domoco finally came through (after a third email). The modem is back online now. It is still unclear why the modem connection keeps dropping. The only known way to remotely reset it is through Docomo. I'll continue keeping tabs on the network status.
[Later Ross speculated about the reason why Docomo took so long to respond to his May 14th message.  This is from an email dated May 30th:]
My impression of the Docomo situation is that we were put on the back-burner. My email yesterday contained a bit more pressure and that did the trick. Also, the account manager (Kevin McCale) was on leave for a couple weeks and the technician (Luigi Onglao) cc'd responded to him, but not me. Luigi thought that Kevin had responded. Just a miscommunication. They are both usually good about responding promptly.
[However, the modem connection continues to be spotty and error-prone following this reset until about June 22nd (for details, see later blog entry timestamped June 22nd).]

Mike J+

Tuesday, May 15, 2012

another Docomo outage (for a few days)

[From an email I sent out on Monday, May 14th, 2012:]
I noticed last week that the feed from the LLBP7 station had been interrupted again.  Last week I was hugely busy, however, and by the time I sat down on Friday to send out an alert email like this one, the data feed was working again.  I think it was probably offline for about two or three days.
[From a reply by Ross Timmerman, later that same day:]
Last week the modem was offline a few days and I reported it to Docomo. They responded fairly quick asking me to try connecting again, and it was fixed. I asked Docomo what did but they never elaborated, nor did they explain why the outage occurred. I will keep an eye out for connection issues and keep Docomo posted.

Friday, May 4, 2012

detailed discussion of PacIOOS/datalogger logic

On April 26th, Ross Timmerman contacted me about a "data alignment" problem he was experiencing with the PacIOOS data.  The PacIOOS CTD's data are recoverable in two ways.  The first way is from its real-time reports written from its serial port to the listening datalogger, which are then parsed, stored in datalogger memory, and included in data table downloads (via the cellular modem) from the datalogger.  These downloads are then written to a file that is monitored by a Java program supplied by PacIOOS.  Note that there is a known problem in this Java program that caused the loss of one or two CTD records per hour, although all of those records are archived here in Miami.

The second way to recover the PacIOOS CTD's data is by direct download from its on-board memory.  I never got confirmation of when those data were downloaded by David and Steven, but from a blip in the data feed on March 26th, 2012, I suspect the download may have happened on that day.

The problem as Ross reported it was that although the measurements from these two sources were exactly the same, the timestamps were not.  Timestamps on the data captured by the datalogger and accessed by modem were consistently ahead of those downloaded directly from the CTD, by seventeen minutes and one second.  Ross contacted me to ask whether this time offset might be caused by something in our data transfer or parsing routines.

My answer (sent May 3rd) was basically that I did not see how this could be possible.  The CTD reports a timestamp to the logger, and the logger breaks it up into numbers and stored these in separate memory locations with no "awareness" that they are date and time values.  In order for some step in the process to add 17:01 to each timestamp, it would need to correctly handle hour (and day, and so on) boundaries in multiple memory locations.

I am reproducing large portions of my email reply below, because they provide an insight into how the PacIOOS CTD reports its data and status messages, how these messages are parsed by the datalogger, and the form in which those data are recovered by our systems in Miami.  [editor's note: as of August 14th, 2012, there is still (to my knowledge) no explanation of the data alignment problem reported by Ross in April.]

[email excerpt begins here:]

I think now I at least understand the general description of the problem.

Unfortunately, I can't see how this could be happening on this end of the equation.  Let me line up some evidence for you by describing what our logger/processing is like.

This is a sample record from the PacIOOS CTD produced when you were visiting AOML in March of 2010:

# 23.2650,  0.00001,    0.031, 0.0658, 0.1418,   0.0114, 1491.963, 08 Mar 2010 13:12:29
The logger program distinguishes one of these "data" lines apart from the status message lines because it begins with # (status message lines begin with <). Then it "splits" the line into separate fields based on CR, LF, space, comma and colon characters (producing 13 fields in all).  The ninth field, a 3-character month, is converted to a number between 1 and 12 according to a program "case" statement (and it would get -1 if it didn't match any of them).

These thirteen values are then stored in thirteen different memory locations in the logger memory.  This is the only time that these date/time values are written to in the logger program.  Note, too, the that logger doesn't "know" that these are date and time fields, so it would take quite a lot of logic for it to apply a consistent time offset to these values.  In order to add seventeen minutes to every measurement, it would have to adjust the hour, day, month and even year values to compensate when the minute fields exceeded 60.  But the program is "dumb" enough that it only reports what it sees from the CTD reports.

Let's add some more evidence.  I'm focusing on that record highlighted in your first message, "22 Sep 2011 12:01:06".  I went back to the raw table-data downloads from the logger, and here is the relevant data record:

"2011-09-22 12:06:00",7461,2011,265,1206,12.69,12.69,12.7,4316,28.62,1,72,1,30,915,27.99,410.9,"2011-09-22 12:02:00",411,8.23,"2011-09-22 12:05:50",4.912,"2011-09-22 12:06:00",2.777,157.9,318.4,-0.001,0,-0.001,0.066,12.63,25.12,12,0.006,-0.002,0,0.372,12.58,26.06,12,56.44,29.34,4.13,1543,34.17,21.8,1,2011,9,22,12,1,6,29.3537,5.63757,2.333,0.0643,0.0784,34.1177,1543.372,2011,9,22,12,3,39,1606481,4,12.1,8.7,61.3,137,21.9,138662,7298,3455762,19,3,0,0,23,1,1,0,0,0,0,0,56.71,29.56,1544,34.2,0,6.8,4,"2011-09-22 12:00:05",2.853,285.6,27.9,28.1,77.47,410.4,"2011-09-22 12:00:55",410.4,0,0,0,0,0,0,0,0,0,0,12.73,3.516,6,6

Your CTD's data are this portion:
(data portion)
2011,9,22,12,1,6,29.3537,5.63757,2.333,0.0643,0.0784,34.1177,1543.372
(status portion)
2011,9,22,12,3,39,1606481,4,12.1,8.7,61.3,137,21.9,138662,7298,3455762,19,3
(logger counts)
0,0,23,1,1

The explanation of these fields is this:
(data portion)
PacIOOS Year
PacIOOS Month
PacIOOS Day
PacIOOS Hour
PacIOOS Minute
PacIOOS Second
PacIOOS Water Temperature
PacIOOS Conductivity
PacIOOS Water Pressure
PacIOOS Chloro Volts
PacIOOS Turbid Volts
PacIOOS Salinity
PacIOOS Sound Velocity

(status portion)
PacIOOS Status Year
PacIOOS Status Month
PacIOOS Status Day
PacIOOS Status Hour
PacIOOS Status Minute
PacIOOS Status Second
PacIOOS Status Serial Num
PacIOOS Status Num Events
PacIOOS Status Volts Main
PacIOOS Status Volts Lith
PacIOOS Status Curr Main
PacIOOS Status Curr Pump
PacIOOS Status Curr Ext
PacIOOS Status Mem Bytes
PacIOOS Status Samples
PacIOOS Status Sample Free
PacIOOS Status Sample Len
PacIOOS Status Headers

(logger counts)
PacIOOS Bad Count
PacIOOS Other Count
PacIOOS Stat Count
PacIOOS Rec Count
PacIOOS Rec SubCount


Just as the "data" values are as extracted from an individual data line, the "status" values come from the "getsd" command that the logger sends to the CTD once per hour, at three minutes past the hour.  Here is a sample "status" output from your visit [to Miami] in March of 2011:

<StatusData DeviceType = 'SBE16plus' SerialNumber = '01606481'>
  <DateTime>2010-03-09T09:33:51</DateTime>
  <LoggingState>logging</LoggingState>
  <EventSummary numEvents = '4'/>
  <Power>
     <vMain>13.5</vMain>
     <vLith>8.5</vLith>
     <iMain>61.2</iMain>
     <iPump> 0.4</iPump>
<Executing/>
<Executing/>
<Executing/>
<Executing/>
     <iExt01>20.5</iExt01>
  </Power>
  <MemorySummary>
     <Bytes>8949</Bytes>
     <Samples>471</Samples>
     <SamplesFree>3462589</SamplesFree>
     <SampleLength>19</SampleLength>
     <Headers>5</Headers>
  </MemorySummary>
</StatusData>


The last five values are my diagnostics from the parsing of the CTD output.  Record Counts and Record SubCounts are running counts of how many data records have been received from the CTD.  The Counts are reset to 0 on the hour and the SubCounts are reset to 0 every six minutes.  The Counts should cycle from 0 to 10 throughout the hour and the SubCounts should always be 0 or 1.

The Status Counts are the number of lines that were recognized as elements of a status message, by virtue of beginning with <.  There used to be 23 of these (as in the message above), consistently.  Sometime during the station outage (October - March), this number grew to 26.  More about this in a moment.

The "Bad" Counts are lines that were recognized as status lines but didn't match any expected string.  There used to be zero of these but sometime between October and March there began to be 3 such lines.  Along with the StatCount jump from 23 to 26, this is consistent with the CTD adding three new lines of unanticipated formatting to the same old status message.  Looking through the bits of datalogger parsing variables (via the cellular modem) I see signs of a line reading "<Executed/>".  Note that this line wasn't in my sample output when I wrote the program so I didn't match against it.  My guess is that all three of these "Bad" lines now appearing in every status message are this "<Executed/>" line.  I'm not overly worried about it.

"Other" Counts are lines that were not recognized as either status or data messages.  The program also throws away "s>" lines and lines that contain nothing but "sbe 16plus", as well as blank lines.  So the "Other" counts have always been zero, and remain so.

Whew.  Now let's look at the entire logger record again.  The first field is the logger timestamp for the record.  Picking apart the contents of this record, you see this series of events:

  • at 12:01:06 (data values 2011,9,22,12,1,6) the logger records the CTD data output.
  • at 12:03:39 (data values 2011,9,22,12,3,39) the logger records the CTD status output.  This strongly suggests that the CTD's clock is running about 39 seconds faster than the logger clock (which, though imperfect, seems reasonable to me), since the logger sends the "getsd" command at exactly 3 minutes past the hour.  Note that these values are repeated in the 6-minute data table output for 10 records, since the status values are updated only once per hour.  I didn't have to write these values into the 6-minute table at all but I thought you might like to receive some of them in your data feed, which is built from the 6-minute reports.  You decided against that, but they're not taking up much room so I've left them in.
  • at 12:06:00 (logger record timestamp "2011-09-22 12:06:00"), the logger takes a snapshot of a pre-determined subset of its system variables and writes their content into the 6-minute data table.  Those are all of the values you see above.  Note that some of them are logger timestamps, like the timestamps of maximum winds or minimum barometric pressures.  Your "timestamps" though, as I've explained, are just sets of six unrelated integers as far as the logger is concerned.

So the question is, in what form did the CTD output this data line on the serial line to the logger?  Your CTD downloads suggest that it must have been in this form:

# 29.3537,  5.63757,    2.333, 0.0643, 0.0784,  34.1177, 1543.372, 22 Sep 2011 11:44:05

My logger records say it must have been in this form:

# 29.3537,  5.63757,    2.333, 0.0643, 0.0784,  34.1177, 1543.372, 22 Sep 2011 12:01:06

My conclusion:  I cannot see how the logger could have "made up" the "12:01:06" timestamp.  It's just not that smart, and all of the timing is otherwise consistent as you can see.  I could maybe understand if the parsing routines were switching seconds with minutes, or something.  I could maybe understand if "real" timestamps from elsewhere in your download file were being misapplied to the wrong CTD data, maybe with a consistent offset in number of records.  But there's no player in this picture who is smart enough to read "11:44:05" and come with "12:01:06", particularly across hour/day/month boundaries.

I would suggest you investigate the integrity of the CTD download file a bit more.  My main thought is, what would happen if the laptop used to download the CTD data was 17 minutes off in its system clock?  Do you have the "raw" CTD data file or some kind of SeaBird-converted version of it?  What version of the SeaBird software was used to download the file (and might that specific version be vulnerable to a miss-set system clock)?  Was the file sent to you in UTC, ChST, HST, EST, EDT, or some other timezone?  Do you have a CTD in the lab that you can test against that version of the SeaBird software with a laptop that has the wrong system time?

I'm also wondering whether this same offset can been seen post-station-revival.  I never got confirmation on exactly when the CTD data were downloaded, but I suspect from a dip in the measurement counts that it might have happened on March 26th.  If so, then you should have both sets of data for the period from March 19th to March 26th, right?  I'm wondering if that week's data showed the same time offset that you're seeing in the August - October data.

Wednesday, April 25, 2012

Datalogger files erased, feeds resume

Following the April 12th - 19th Docomo outage, some of the datalogger's data tables stopped syncing properly.  I made a few attempts to work around the problem by connecting (via the cellular modem) to the station with the LoggerNet software, but eventually I decided to cut our losses and reformat the logger's memory card to resume full data collection as soon as possible.

All available data files were downloaded from the logger on April 24th, then the memory card was formatted.  After that, all data functions (storage and retrieval) returned to normal.

The data table most affected by this problem was the "6 minute" table.  Those data are available from March 20th (reinstallation) to April 12th (Docomo outage), and from April 24th (reformatting) onwards.

With the 6-minute data table operating normally again, our data feeds to PacIOOS and NDBC have resumed.

Mike J+

Friday, April 20, 2012

Station back online -- Docomo had network problems

[From an email by Mike Jankulak on Thursday, April 19, 2012:]
The cellular modem was back online this morning when I arrived at work.  Right now our data feeds are still down because the server files are out of sync with what's stored on the logger (this happened last month too), so I'm working on that.
[From an email by Ross Timmerman on Thursday, April 19, 2012:]
Just had a look at AceManager and all the modem settings appear okay. I don't know why the modem went down earlier, but it was definitely network related. I asked Docomo whether they had a network outage, their response below (sent last night):

"Your services were not disconnected.  Although, we may have resolved the problem.  Please try connecting to the device again, and advise if successful or not."

Saturday, April 14, 2012

saipan station offline again (apr 12)

[An excerpt from an email sent from Miami on Friday, April 13th, 2012.]

Bad news, the data feed from the CREWS station has stopped again.

This occurred (in the various time zones of relevance) at:

Thu Apr 12  5:10:35 (Honolulu / HAST / -1000)
Thu Apr 12 11:10:35 (Miami / EDT / -0400)
Thu Apr 12 15:10:35 (UTC / +0000)
Fri Apr 13  1:10:35 (Saipan / ChST / +1000)

So it was it the early hours of the morning, local time, Friday the 13th.  I see no signs of trouble in the data feed -- no voltage spikes or drops, no drop in barometric pressures or rise in winds, no instruments going suddenly or suspiciously offline.

So my best guess right now is that the modem went silent again.  Recall that we still haven't explained why that happened on October 2nd, although we do know why the station itself died on October 4th.

Mike J+

Friday, April 13, 2012

Maintenance log: Station cleaning with DEQ and CRM

[From David Benavente's LAOLAO Bay ICON Station Maintenance log -- April 12, 2012:]

Station cleaning with DEQ and CRM. There was a three-way objective to this trip. MMT had to accomplish a site survey, collect water quality samples and clean the ICON Station. The team broke into smaller groups. Ben C. assisted by Jose Q. were tasked with cleaning the ICON Station. When our surveys were complete I Inspected the station; Ben C. reported that he replaced the copper screens on the Deep CTD.

Friday, April 6, 2012

Station Update: Data Feeds Resuming

The Saipan CREWS station's "brain" (control unit) was reinstalled by David Benavente and colleagues on Monday, March 19th, 2012, as described elsewhere on this blog.

David & co. did a fantastic job, particularly considering that the brain-reconnection training we had hoped to give them last summer wasn't possible due to a combination of terrible seas and missing radio antennas. We have been relying on email (and my lengthy documents of instruction) for training, for everything from software installation and radio communications to installing and connecting the station's hardware.

Their March 19th visit had to be cut short and at the time they left, they reported that the Deep BIC (light sensor) and Deep CTD both seemed to be offline. In fact, there is a delayed startup for the CTD and it can take 6 or 12 minutes (or sometimes longer) for it to begin producing data. When I later examined the data feed, the Deep CTD was communicating normally.

The Deep BIC is offline. It is the only communications failure following the station's recovery. This could be caused by any number of wiring problems at the top of the pylon. I would not consider this to be a hugely urgent matter, but during the next visit David & co. may want to open up the top of the pylon and visually inspect the Deep BIC plugs -- are they connected properly, are they pushed together all the way, is there any sign of loose or broken wires on either side?

The other anomaly in the data stream is that the salinity numbers from the Deep CTD are not tracking properly with those from the Shallow/PacIOOS CTD. I would tend to believe what the Shallow/PacIOOS CTD is saying, particularly since it has recently been retrieved for memory download and battery replacement, and presumably was examined and cleaned at that time. The local team might want to examine the Deep CTD more closely during the next visit, give it a good cleaning and maybe replace its copper screens if they have started to dissolve.

Also, it would be a good idea to do a full station cleaning including the connection of the "groundtruth" CT sensor for the required three hours. This would give us another set of salinity numbers to confirm which of the two CTDs are inaccurate.

In addition, there is still the "extra" or "shallow" CT. We had originally intended to install this CT permanently in August but the instrument we'd planned to use was nonfunctional when retrieved from storage. The shipment that returned the "brain" to Saipan also included a replacement CT and this can be deployed at any time using the cables that were installed and connected last summer.

The data stream shows one short blip from the Shallow/PacIOOS CTD on about March 27th. I am assuming that this is when the Shallow/PacIOOS CTD was temporarily disconnected for memory download and battery replacement.

In the past few days, I have started up all of the Saipan data feeds again. The most recent 24/72 hours of data are updated hourly/daily on our web site:


The data feed to the National Data Buoy Center (NDBC) has been restarted. As of this writing, oceanic measurements are already populated on the NDBC site and a configuration error with the meteorological data has been corrected so those data should start loading shortly:


The feed of CTD data from AOML to PacIOOS is under development and a version of that feed began running yesterday.

And finally, data are loading into CHAMP's "ecoforecast" page here:


Congratulations to David B. and the rest of the team in Saipan on a job well done!

(signed)
Mike Jankulak, AOML, Miami

Tuesday, April 3, 2012

And its ON:

On the Morning of March 19th Three MMT members (John Iguel, Ryan Okano & I) loaded up CRM’s small zodiac with The Laolao Bay Pylon Stations “brain”. With only a short timeframe to work, due to tidal constraints, we hurried out to the bay, pushed our small boat over the reef and headed for the station.

Once at the station everything seemed to go according to plan. I geared up and climbed to the top of the station. After months of reading through extraction and installation instructions provided by Mike J, I had clearly developed a system for keeping track of the wiring. After about ten minutes of set-up I signaled Ryan to send up the Brain. Once the Brain was at the top of the station all my focus was set on its installation, which to my surprise went by very quickly. Upon switching the brain “On” I was excited to see that all its components were lighting up and blinking normally.

Back on the boat, with my laptop out, we checked to see if the brain was working properly. Although everything seemed to be working more diagnostics are still very much needed. I felt a great sense of pride upon reinstalling the Brain, and I am so grateful for all the help from NOAA AOML, PACIOOS and the MMT. Special thanks to Mike J., and Mike S. from NOAA AOML.

Saturday, February 4, 2012

A "smoking gun" ...

Mike Shoemaker and I examined the brain this morning for clues about what went wrong on Oct 4th. Almost immediately we found that one of the two fuses was blown.

How this could have happened: there is a "screw-down block" at the top of the brain on one side with "+12V" terminals arrayed on one side in red and "Ground" terminals on the other side in black. We think something touched this block in a way that completed the circuit and blew one of the two fuses. In my opinion the most likely cause was the windbird plug, which includes one unshielded wire that acts as the instrument's ground. This wire was originally wrapped in electrical tape but I had to expose everything on that that last visit (Saturday, August 27th) to rewire the plugs for the windbird because it wasn't working properly, and evidently I did not rewrap it before leaving. Then on Oct 4th, David must have shifted the tangle of wires enough to short-circuit that board in the moments before he powered off the station with the on/off switch. It could have been shorted by other things during that Oct 4th visit (a little seawater, a screwdriver, other metal tools) but the windbird-plug explanation is plausible.

The effect: with the whole drill-down block powerless, none of the electronics would have had power -- not the brain, not the modem, not the radio, and none of the instruments (including the PacIOOS CTD). When the power switch was turned back on again, the only working part would have been the batteries, the charger-controller (with a red light to indicate charging), and the solar panels. So we think that the station continued to charge its batteries every day until brain removal in January but nothing else was working during that time (except the PacIOOS CTD, running off its own battery power).

What comes next: Shoemaker has the brain in his lab and will be checking it over for (other) signs of trouble. We will be wrapping that windbird plug in tape again, and Shoemaker has a plan to cover that screw-down block with some kind of shield to avoid any repeat of this problem. When he's done I will update the programming (to the latest version and put this brain on our lab's roof with some surface instruments. This should confirm that the electronics are all operational and that the charger-controller can still charge a battery. I'll let that run for a few days and then we'll box up the brain and ship it back to Saipan again. We'll be sure to ship spare fuses back to Saipan as well.

Mike J+

Wednesday, February 1, 2012

Troubleshooting continues

As of Tuesday afternoon (Saipan time) on January 31st, 2012, the station's control unit or "brain" is on its way back to Miami for examination and possible repair by AOMLers.

This is part of a three-step approach to bringing the station back online:
  1. Finding out why the cellular modem went offline (Oct 2nd).
  2. Finding out why the station lost power (Oct 4th).
  3. Finding out why the RF radios aren't working.
For (1) the cellular modem, it does not appear to be malfunctioning. It was reachable on land via wireless connection several times during troubleshooting operations jointly undertaken by David Benavente, Steven Johnson and Ross Timmerman last November. It merely seems to have gone spontaneously offline on October 2nd, and then later its station power supply seemed to have failed by November 29th. The only thing that stands out from its diagnostics is the unusually high number of "system resets" noted by Ross.

One possible avenue of investigation concerns an AT&T cellular modem that AOML is operating here in South Florida at a test station in Fort Lauderdale. On January 23rd I noticed that this modem's communications were undergoing frequent resets in a way reminiscent of the Docomo modem had done in Saipan before failing. I have relatively easy access to this modem if it should fail, and I've contacted Campbell Scientific support to try to interest them in looking at our software logs from this test station.

For (2) the station's loss of power, we will see if we learn anything from our examination of the brain. [It's entirely possible that we will find that everything appears to be normal, in which case we can only return the brain to Saipan and attempt reinstallation.] Earlier this week we successfully concluded another test of the electronics when David powered up both the datalogger and the cellular modem on the workbench, and I was able to connect to the logger from our systems here in Florida. I was able to download all of the 1-day, 60-minute, 6-minute and 1-minute data tables while this test was still running. So our analysis will initially focus on the wires and connections that supply power to all of these components.

Regarding (3) the RF radios, we have a success to report. David and Steven were able to get their RF radio connection working on land, after reviewing some configuration settings that I suggested might be causing problems. This is a very important step because it means they will have a way to connect to the station from a laptop in the boat, when it comes time to reinstall the brain. They will know immediately, after powering up the station, whether it is running and whether all of the instruments are properly connected. [There is still no way to tell from the boat whether the cellular modem is working properly. We might be able to brainstorm something if they have a wireless internet connection out there, or just coordinate their reinstallation so that they have someone to call on land who can try to connect to the modem.]

Mike Jankulak

Saturday, January 14, 2012

brain retrieval and data file analysis

Again the background: the Saipan CREWS station lost communications (October 2nd), an attempt was made to power-cycle the cellular modem (October 4th or 5th) followed by the retrieval of that modem (November 14th or 15th) for testing and evaluation, a land-based test of the modem near the station (November 21st) and the reinstallation of the cellular modem at the station (November 28th or 29th). This was followed by an unsuccessful attempt to connect to the station by radio (December 15th).

The only diagnostic step left was to retrieve the station's control unit (or "brain"). This unfortunately would have the effect of shutting down all station operations except for the PacIOOS-supplied CTD, which has its own battery backup. However at this point we still had no assurance whatsoever that the station had continue to operate in any capacity after its initial loss of communications on October 2nd.

On December 19th (though mindful of the disruptive nature of the holiday season) I sent out detailed instructions to David Benavente (Coastal Resources Management, Saipan) and Steven Johnson (Division of Environmental Quality, Saipan) on how to safely disconnect all power and instruments at the station and remove the brain. On January 3rd I followed up with instructions on how to recover the station's locally-stored data files once the brain had been retrieved.

On Thursday, January 12th at 8:30am (Saipan time) Steven reported some good news:
David and I were able to retrieve the brain out of the station yesterday. We will start doing some preliminary trouble shooting today. We will keep you posted on our progress.
This was followed later on at 3:39pm with a message from David saying:
So I connected to the control unit and downloaded data for the first TAB0/ TAB1 and TAB2. I uploaded them on to the AOML ftp site. I didn't have to enter a username when uploading the files so I wasn't sure whether they had gotten through. Just let me know if they didn't and I'll try again.

Oh another thing that I noticed while retrieving the brain was that there was a bit a moisture inside the grounding plug. As I pulled the two ends apart a drop of liquid fell onto my hand. Not sure what the implications of that are but it seemed odd because everything else was dry. Well thats it for now, I'll continue to download the other files and send them over. Let me know if the files downloaded correctly.
This represented the first new data report from the station since it lost communications on October 4th. I will try to be very clear about what I have learned:
  1. The station has been completely offline since early October, 2011. This of course is a crushing disappointment for all of us.
  2. The biggest surprise is that the data stream ends on October 4th, not on October 2nd when communications were lost. There are no indications of unusual circumstances in the data record at the time when the station initially lost communications. This initial loss of communications took place at UTC Sunday, October 2nd 19:10 (in Saipan time this is at 5:10am on the morning of Monday, October 3rd).
  3. This means that the initial loss of communications is still unexplained. My guess would be either a gradually-worsening loose power connection or some kind of progressive failure of the cellular modem. Our best evidence about this failure remains Ross Timmerman's discovery of the unusually-large number of system resets by the modem (described in this previous blog posting).
  4. The station's data record ends at UTC Tuesday, October 4th 4:12 (in Saipan time this would have been at about 2:12pm on Tuesday, October 4th). Again, there is no indication whatsoever in the data record of any problems leading up to the moment of total systems failure. The station appears to have been operating perfectly for those last two days except for its loss of communications. I have gone instrument by instrument and diagnostic by diagnostic over every retrieved data point and the station appears to have been operating perfectly up to the very last minute.
  5. Thus far I have only seen the station's 1-day, 60-minute and 6-minute data tables. There may be some further details to be gleaned from the other three data tables (1-minute, 30-second and 5-second). But based on what I've seen so far, there are not likely to be any significant revelations in these other data tables.
  6. There is no possibility of data corruption (i.e., it is not possible that the station continued to operate normally after October 4th but we merely failed to recover its complete data record) because the datalogger numbers its records and none are missing.
One question remains to be investigated: when exactly did was the modem power-cycled? To my mind the most likely explanation is that the cellular modem failed for reasons unknown and then some further problem was accidentally introduced when the stationtop was opened up during that October 4th/5th power-cycling visit. Given that there is no sign of station or instrument distress up to the moment of failure, the most likely explanation is human intervention.

This, sadly enough, is the risk that we take every time that we open up the CREWS station top and work with its innards. Normally our "insurance policy" is to connect to the station via radio from the boat every time after closing up the station top. But unfortunately we did not have the time, weather, equipment and software that we needed to configure and test the radios during installation in August. This left the on-site maintenance team without their most important tool.

The takeaway from all this is that we probably have a wire (or many wires) pulled loose somewhere. We know that the datalogger probably lost power on October 4th and has never regained it. We know that the cellular modem did not appear to have power when it was reinstalled on November 28th or 29th. We also have reason to think that some of the other instruments, possibly the SIO4 serial ports or the radio, have had at least intermittent power since October 4th, given David's report of seeing lights blinking on November 28th or 29th.

So the next thing to try would be a visual examination of the "brain" unit for signs of loose wires. This may be followed either by attempts to reinstall the brain in the station or perhaps by shipping the entire "brain" package back to Miami for evaluation. I will update this blog again when we have decided on our next step.