How to use

Example file:

KorEXO MEASUREMENT DATA FILE EXPORT,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
FILE CREATED:,10/16/2020 6:21:53 AM,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,MEAN VALUE:,1172.5,0.605,1277.2,85.5,85.6,7.66,47.1,0.859,0.64,1268.1,824,NA,-126,20.679,0.812,3.13,11.4,
,,,STANDARD DEVIATION:,566.7,0.791,617.1,9.2,9.2,0.97,1.1,1.123,0.31,612.7,398,0.1,5.7,0.8,0.972,0,0,
,,,,,,,,,,,,,,,,,,,,,
,,,SENSOR SERIAL NUMBER:,19A103955,19B104242,19A103955,19D101830,19D101830,19D101830,19B105042,19B104242,19A103955,19A103955,19A103955,19B105042,19B105042,19A103955,19B104242,19C000969,19C000969,
Date (MM/DD/YYYY),Time (HH:mm:ss),Time (Fract. Sec),Site Name,Cond µS/cm,Depth m,nLF Cond µS/cm,ODO % sat,ODO % local,ODO mg/L,ORP mV,Pressure psi a,Sal psu,SpCond µS/cm,TDS mg/L,pH,pH mV,Temp °C,Vertical Position m,Battery V,Cable Pwr V,DTW
10/16/2020,15:49:33,0,SQR097,3,-0.001,3.4,102.3,102.5,9.5,49.6,-0.002,0,3.4,2,9.23,-137.6,18.996,-0.002,3.13,11.4,4.2
10/16/2020,15:49:34,0,SQR097,3,-0.001,3.4,102.3,102.5,9.5,49.5,-0.002,0,3.4,2,9.23,-137.6,18.998,-0.001,3.13,11.4,
10/16/2020,15:49:35,0,SQR097,3,-0.001,3.4,102.3,102.5,9.49,49.4,-0.002,0,3.4,2,9.23,-137.6,18.999,-0.001,3.13,11.4,
10/16/2020,15:49:36,0,SQR097,3,-0.001,3.4,102.3,102.5,9.49,49.3,-0.002,0,3.4,2,9.23,-137.6,19.001,-0.001,3.13,11.4,
10/16/2020,15:49:37,0,SQR097,3,-0.001,3.4,102.3,102.5,9.5,49.2,-0.002,0,3.4,2,9.23,-137.6,19.002,-0.001,3.13,11.4,
10/16/2020,15:49:38,0,SQR097,3,-0.001,3.4,102.3,102.5,9.49,49.1,-0.002,0,3.4,2,9.23,-137.6,19.004,-0.001,3.13,11.4,
10/16/2020,15:49:39,0,SQR097,3,-0.001,3.4,102.4,102.5,9.49,48.9,-0.002,0,3.3,2,9.23,-137.6,19.005,-0.001,3.13,11.4,
10/16/2020,15:49:40,0,SQR097,3,-0.001,3.4,102.4,102.5,9.5,48.8,-0.002,0,3.3,2,9.23,-137.6,19.007,-0.002,3.13,11.4,
10/16/2020,15:49:41,0,SQR097,3,-0.001,3.4,102.4,102.5,9.49,48.7,-0.002,0,3.3,2,9.23,-137.6,19.008,-0.002,3.13,11.4,

To read this file:

[1]:
import korexo_profile
data = korexo_profile.read("../tests/example2.csv")

This returns a dictionary with three keys:

[2]:
data.keys()
[2]:
dict_keys(['metadata', 'datasets', 'dataframe'])

Sometimes, people open the raw file in Excel to view or edit it in some small way and then save as a CSV. This seems innocuous, but completely changes the format (and usually the encoding also - see below).

Raw pandas dataframe

The raw data is read into a pandas dataframe:

[3]:
df = data["dataframe"]
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   Date (MM/DD/YYYY)    9 non-null      object
 1   Time (HH:mm:ss)      9 non-null      object
 2   Time (Fract. Sec)    9 non-null      int64
 3   Site Name            9 non-null      object
 4   Cond µS/cm           9 non-null      int64
 5   Depth m              9 non-null      float64
 6   nLF Cond µS/cm       9 non-null      float64
 7   ODO % sat            9 non-null      float64
 8   ODO % local          9 non-null      float64
 9   ODO mg/L             9 non-null      float64
 10  ORP mV               9 non-null      float64
 11  Pressure psi a       9 non-null      float64
 12  Sal psu              9 non-null      int64
 13  SpCond µS/cm         9 non-null      float64
 14  TDS mg/L             9 non-null      int64
 15  pH                   9 non-null      float64
 16  pH mV                9 non-null      float64
 17  Temp °C              9 non-null      float64
 18  Vertical Position m  9 non-null      float64
 19  Battery V            9 non-null      float64
 20  Cable Pwr V          9 non-null      float64
 21  DTW                  1 non-null      float64
dtypes: float64(15), int64(4), object(3)
memory usage: 1.7+ KB
[4]:
df.head()
[4]:
Date (MM/DD/YYYY) Time (HH:mm:ss) Time (Fract. Sec) Site Name Cond µS/cm Depth m nLF Cond µS/cm ODO % sat ODO % local ODO mg/L ... Sal psu SpCond µS/cm TDS mg/L pH pH mV Temp °C Vertical Position m Battery V Cable Pwr V DTW
0 10/16/2020 15:49:33 0 SQR097 3 -0.001 3.4 102.3 102.5 9.50 ... 0 3.4 2 9.23 -137.6 18.996 -0.002 3.13 11.4 4.2
1 10/16/2020 15:49:34 0 SQR097 3 -0.001 3.4 102.3 102.5 9.50 ... 0 3.4 2 9.23 -137.6 18.998 -0.001 3.13 11.4 NaN
2 10/16/2020 15:49:35 0 SQR097 3 -0.001 3.4 102.3 102.5 9.49 ... 0 3.4 2 9.23 -137.6 18.999 -0.001 3.13 11.4 NaN
3 10/16/2020 15:49:36 0 SQR097 3 -0.001 3.4 102.3 102.5 9.49 ... 0 3.4 2 9.23 -137.6 19.001 -0.001 3.13 11.4 NaN
4 10/16/2020 15:49:37 0 SQR097 3 -0.001 3.4 102.3 102.5 9.50 ... 0 3.4 2 9.23 -137.6 19.002 -0.001 3.13 11.4 NaN

5 rows × 22 columns

Per dataset (recommended)

The information is also stored per-dataset, with some data-type conversions applied under the "datasets" key. I would recommend using this in most cases. Each item is a dictionary with keys as shown below.

The Date, Time, and Site Name columns have the handy "median" key, while all others you’d be more interested in the "data" key which contains a numpy ndarray with the data:

[5]:
data["datasets"][0]
[5]:
{'name': 'Date',
 'column': 'Date (MM/DD/YYYY)',
 'sensor': '',
 'mean': <NA>,
 'stdev': <NA>,
 'data': [datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16),
  datetime.date(2020, 10, 16)],
 'median': datetime.date(2020, 10, 16)}
[6]:
data["datasets"][1]
[6]:
{'name': 'Time',
 'column': 'Time (HH:mm:ss)',
 'sensor': '',
 'mean': <NA>,
 'stdev': <NA>,
 'data': array(['15:49:33', '15:49:34', '15:49:35', '15:49:36', '15:49:37',
        '15:49:38', '15:49:39', '15:49:40', '15:49:41'], dtype=object),
 'median': '15:49:33'}

And the last one:

[7]:
data["datasets"][9]
[7]:
{'name': 'ODO mg/L',
 'column': 'ODO mg/L',
 'sensor': '19D101830',
 'mean': 7.66,
 'stdev': 0.97,
 'data': array([9.5 , 9.5 , 9.49, 9.49, 9.5 , 9.49, 9.49, 9.5 , 9.49]),
 'median': 9.49}

Other metadata

Raw file metadata is also available:

[8]:
data["metadata"]
[8]:
{'created_file': datetime.datetime(2022, 4, 7, 16, 45, 10, 874033),
 'modified_file': datetime.datetime(2022, 4, 7, 16, 45, 10, 875029),
 'created_info': '10/16/2020 6:21:53 AM,,,,,,,,,,,,,,,,,,,,',
 'header_line_no': 9,
 'params': ['Cond µS/cm',
  'Depth m',
  'nLF Cond µS/cm',
  'ODO % sat',
  'ODO % local',
  'ODO mg/L',
  'ORP mV',
  'Pressure psi a',
  'Sal psu',
  'SpCond µS/cm',
  'TDS mg/L',
  'pH',
  'pH mV',
  'Temp °C',
  'Vertical Position m',
  'Battery V',
  'Cable Pwr V',
  'DTW'],
 'sensors': ['19A103955',
  '19B104242',
  '19A103955',
  '19D101830',
  '19D101830',
  '19D101830',
  '19B105042',
  '19B104242',
  '19A103955',
  '19A103955',
  '19A103955',
  '19B105042',
  '19B105042',
  '19A103955',
  '19B104242',
  '19C000969',
  '19C000969',
  ''],
 'means': [1172.5,
  0.605,
  1277.2,
  85.5,
  85.6,
  7.66,
  47.1,
  0.859,
  0.64,
  1268.1,
  824,
  <NA>,
  -126,
  20.679,
  0.812,
  3.13,
  11.4,
  nan],
 'stdevs': [566.7,
  0.791,
  617.1,
  9.2,
  9.2,
  0.97,
  1.1,
  1.123,
  0.31,
  612.7,
  398,
  0.1,
  5.7,
  0.8,
  0.972,
  0,
  0,
  nan]}

Date formats

I have found that the way the KorEXO sondes encode dates to be a little messy. Note that the column header for the first column seems to encode the date format: Date (MM/DD/YYYY). This is detected and used automatically by default. However, I’ve seen some files produced where the date format is backwards, so you can also choose to specify the appropriate format in Python datetime format specifiers:

For this example:

sep=,
KorEXO MEASUREMENT DATA FILE EXPORT

FILE CREATED:,10/12/2019 10:12:33 PM

,,,MEAN VALUE:,601.0,0.298,662.6,86.1,85.6,8.17,-35.2,0.424,0.33,657.3,427,NA,-83.2,18.269,0.348,3.06,11.2
,,,STANDARD DEVIATION:,644.7,0.495,709.6,18.5,18.4,1.95,12.0,0.703,0.35,704.0,458,0.20,11.3,2.257,0.489,0.00,0.0

,,,SENSOR SERIAL NUMBER:,18K105940,19A101644,18K105940,18L102250,18L102250,18L102250,18L101854,19A101644,18K105940,18K105940,18K105940,18L101854,18L101854,18K105940,19A101644,19B100840,19B100840
Date (MM/DD/YYYY),Time (HH:mm:ss),Time (Fract. Sec),Site Name,Cond µS/cm,Depth m,nLF Cond µS/cm,ODO % sat,ODO % local,ODO mg/L,ORP mV,Pressure psi a,Sal psu,SpCond µS/cm,TDS mg/L,pH,pH mV,Temp °C,Vertical Position m,Battery V,Cable Pwr V
11/12/2019,08:40:30,0.0,SQR100,9.1,0.002,11.5,98.0,97.4,10.02,11.8,0.002,0.00,11.4,7,7.59,-44.0,14.354,0.003,3.06,11.1
11/12/2019,08:40:31,0.0,SQR100,9.0,0.002,11.5,98.0,97.5,10.02,12.2,0.002,0.00,11.4,7,7.59,-43.8,14.352,0.004,3.06,11.1
11/12/2019,08:40:32,0.0,SQR100,9.0,0.002,11.5,98.1,97.6,10.03,11.0,0.002,0.00,11.3,7,7.59,-43.7,14.349,0.000,3.06,11.1
11/12/2019,08:40:33,0.0,SQR100,9.0,0.002,11.5,98.6,98.1,10.09,8.7,0.003,0.00,11.3,7,7.61,-45.4,14.357,0.003,3.06,11.1
11/12/2019,08:40:34,0.0,SQR100,9.1,0.002,11.5,99.3,98.8,10.16,-9.7,0.003,0.00,11.4,7,7.87,-59.5,14.420,0.001,3.06,11.1
11/12/2019,08:40:35,0.0,SQR100,9.0,0.002,11.5,99.7,99.1,10.18,-14.5,0.003,0.00,11.3,7,7.98,-65.6,14.523,0.001,3.06,11.1
11/12/2019,08:40:36,0.0,SQR100,9.0,0.002,11.4,101.1,100.5,10.30,-21.2,0.002,0.00,11.2,7,8.09,-71.9,14.754,-0.004,3.06,11.1
11/12/2019,08:40:37,0.0,SQR100,9.1,0.002,11.3,102.3,101.8,10.37,-23.1,0.003,0.00,11.2,7,8.13,-73.9,15.171,0.001,3.06,11.1
11/12/2019,08:40:38,0.0,SQR100,9.2,0.002,11.3,102.6,102.1,10.31,-25.5,0.003,0.00,11.2,7,8.16,-75.9,15.666,0.001,3.06,11.1
11/12/2019,08:40:39,0.0,SQR100,9.2,0.002,11.3,102.3,101.7,10.17,-27.3,0.003,0.00,11.1,7,8.20,-78.1,16.020,0.000,3.06,11.2
11/12/2019,08:40:40,0.0,SQR100,9.3,0.002,11.3,101.4,100.8,10.00,-28.3,0.003,0.00,11.1,7,8.22,-79.7,16.220,0.002,3.06,11.2
11/12/2019,08:40:41,0.0,SQR100,9.2,0.002,11.2,100.4,99.9,9.87,-28.5,0.003,0.00,11.1,7,8.24,-80.7,16.250,-0.001,3.06,11.2

The default automatic detection:

[9]:
data = korexo_profile.read("../tests/example1.csv", datefmt="auto")
data["datasets"][0]["median"]
[9]:
datetime.date(2019, 11, 12)

Specifying it directly should provide the same result:

[10]:
data = korexo_profile.read("../tests/example1.csv", datefmt="%m/%d/%Y")
data["datasets"][0]["median"]
[10]:
datetime.date(2019, 11, 12)

But in this case the true date is actually December 11th, which we can force:

[11]:
data = korexo_profile.read("../tests/example1.csv", datefmt="%d/%m/%Y")
data["datasets"][0]["median"]
[11]:
datetime.date(2019, 12, 11)

Beware encodings

I believe the sondes use UTF-16-LE by default. Most text editors do not use this, so if someone has opened the file and then saved it, it likely will have converted it to UTF-8 or Windows-1252.

The default behaviour is simplest:

[12]:
data = korexo_profile.read("../tests/example2.csv")

This is equivalent internally to:

[13]:
data = korexo_profile.read("../tests/example2.csv", encoding="utf-16", auto_revert_encoding="cp1252")

Because auto_revert_encoding is not False, the code will first check whether example2.csv is UTF-16. If it is not, then the file will be loaded using the value of auto_revert_encoding ('cp1252'). If it is UTF-16, it will be loaded as such.

If you want the code to simply use the encoding you direct it to, set auto_revert_encoding=False. This may result in messy errors and weird behaviour.

Skip all that… just give me a complete spreadsheet

[14]:
data = korexo_profile.read("../tests/example1_full.csv")
df = korexo_profile.convert_datasets_to_df(data["datasets"])
df.head()
[14]:
datetime date time time_sec site cond water_depth cond_nlf do_sat do_local ... press sal_psu spcond tds ph ph_mv temp vert_pos battery cable_power
0 2019-11-12 08:40:30 2019-11-12 08:40:30 0.0 SQR100 9.1 0.002 11.5 98.0 97.4 ... 0.002 0.0 11.4 7 7.59 -44.0 14.354 0.003 3.06 11.1
1 2019-11-12 08:40:31 2019-11-12 08:40:31 0.0 SQR100 9.0 0.002 11.5 98.0 97.5 ... 0.002 0.0 11.4 7 7.59 -43.8 14.352 0.004 3.06 11.1
2 2019-11-12 08:40:32 2019-11-12 08:40:32 0.0 SQR100 9.0 0.002 11.5 98.1 97.6 ... 0.002 0.0 11.3 7 7.59 -43.7 14.349 0.000 3.06 11.1
3 2019-11-12 08:40:33 2019-11-12 08:40:33 0.0 SQR100 9.0 0.002 11.5 98.6 98.1 ... 0.003 0.0 11.3 7 7.61 -45.4 14.357 0.003 3.06 11.1
4 2019-11-12 08:40:34 2019-11-12 08:40:34 0.0 SQR100 9.1 0.002 11.5 99.3 98.8 ... 0.003 0.0 11.4 7 7.87 -59.5 14.420 0.001 3.06 11.1

5 rows × 22 columns

And let’s say the depth to water was 5.15 m, so let’s add that offset:

[15]:
df["dtw"] = df.water_depth + 5.15
df.head()
[15]:
datetime date time time_sec site cond water_depth cond_nlf do_sat do_local ... sal_psu spcond tds ph ph_mv temp vert_pos battery cable_power dtw
0 2019-11-12 08:40:30 2019-11-12 08:40:30 0.0 SQR100 9.1 0.002 11.5 98.0 97.4 ... 0.0 11.4 7 7.59 -44.0 14.354 0.003 3.06 11.1 5.152
1 2019-11-12 08:40:31 2019-11-12 08:40:31 0.0 SQR100 9.0 0.002 11.5 98.0 97.5 ... 0.0 11.4 7 7.59 -43.8 14.352 0.004 3.06 11.1 5.152
2 2019-11-12 08:40:32 2019-11-12 08:40:32 0.0 SQR100 9.0 0.002 11.5 98.1 97.6 ... 0.0 11.3 7 7.59 -43.7 14.349 0.000 3.06 11.1 5.152
3 2019-11-12 08:40:33 2019-11-12 08:40:33 0.0 SQR100 9.0 0.002 11.5 98.6 98.1 ... 0.0 11.3 7 7.61 -45.4 14.357 0.003 3.06 11.1 5.152
4 2019-11-12 08:40:34 2019-11-12 08:40:34 0.0 SQR100 9.1 0.002 11.5 99.3 98.8 ... 0.0 11.4 7 7.87 -59.5 14.420 0.001 3.06 11.1 5.152

5 rows × 23 columns

Then re-index so the profile is at 5 cm intervals:

[16]:
df2 = korexo_profile.make_regularly_spaced(df, "dtw", step=0.05)
df2.head()
[16]:
time_sec cond water_depth cond_nlf do_sat do_local do_conc orp_mv press sal_psu spcond tds ph ph_mv temp vert_pos battery cable_power
dtw
5.150000 0.0 9.533333 0.000000 11.500000 91.700000 91.200000 8.910000 -39.200000 0.001000 0.000000 11.366667 7.000000 8.390000 -89.133333 16.663667 0.001333 3.06 11.2
5.202174 0.0 1203.097516 0.052174 1324.406211 92.240373 91.670186 8.264037 -42.380745 0.073963 0.660000 1313.976398 854.192547 8.364037 -88.661491 20.583807 0.419938 3.06 11.2
5.254348 0.0 1220.579710 0.104348 1343.950483 91.725845 91.225845 8.212585 -41.203382 0.148539 0.670000 1333.350483 866.483092 8.305169 -85.658454 20.573034 0.694548 3.06 11.2
5.306522 0.0 1225.072347 0.156522 1348.894872 91.595905 91.095905 8.199795 -40.589761 0.222054 0.670205 1338.192824 870.184295 8.289795 -84.583618 20.575123 0.807270 3.06 11.2
5.358696 0.0 1225.987800 0.208696 1349.886039 91.582138 91.082138 8.199107 -40.555346 0.296185 0.670893 1339.177108 870.803774 8.289107 -84.528553 20.575536 0.811537 3.06 11.2