Closed
Description
Describe the bug
read_crn
fails to map -99999 to NaN
To Reproduce
from pvlib.iotools import read_crn
crn = read_crn('https://www.ncei.noaa.gov/pub/data/uscrn/products/subhourly01/2021/CRNS0101-05-2021-NY_Millbrook_3_W.txt')
crn.loc['2021-12-14 0930':'2021-12-14 1130', 'ghi']
2021-12-14 09:30:00+00:00 0.0
2021-12-14 09:35:00+00:00 0.0
2021-12-14 09:40:00+00:00 0.0
2021-12-14 09:45:00+00:00 0.0
2021-12-14 09:50:00+00:00 0.0
2021-12-14 09:55:00+00:00 0.0
2021-12-14 10:00:00+00:00 0.0
2021-12-14 10:05:00+00:00 -99999.0
2021-12-14 10:10:00+00:00 -99999.0
2021-12-14 10:15:00+00:00 -99999.0
2021-12-14 10:20:00+00:00 -99999.0
2021-12-14 10:25:00+00:00 -99999.0
2021-12-14 10:30:00+00:00 -99999.0
2021-12-14 10:35:00+00:00 -99999.0
2021-12-14 10:40:00+00:00 -99999.0
2021-12-14 10:45:00+00:00 -99999.0
2021-12-14 10:50:00+00:00 -99999.0
2021-12-14 10:55:00+00:00 -99999.0
2021-12-14 11:00:00+00:00 -99999.0
2021-12-14 11:05:00+00:00 0.0
2021-12-14 11:10:00+00:00 0.0
2021-12-14 11:15:00+00:00 0.0
2021-12-14 11:20:00+00:00 0.0
2021-12-14 11:25:00+00:00 0.0
2021-12-14 11:30:00+00:00 0.0
Name: ghi, dtype: float64
Expected behavior
Should return NaN
instead of -99999
Versions:
pvlib.__version__
: 0.9.0pandas.__version__
: 1.0.3 (doesn't matter)- python: 3.7
Additional context
Documentation here says
C. Missing data are indicated by the lowest possible integer for a given column format, such as -9999.0 for 7-character fields with one decimal place or -99.000 for 7-character fields with three decimal places.
So we should change
pvlib-python/pvlib/iotools/crn.py
Lines 112 to 117 in 1ab0eb2
to include -99999 and perhaps -999999. Or do the smarter thing as discussed in the comment.
Activity
AdamRJensen commentedon Jan 7, 2022
@wholmgren I have added -99999 and -999999 to the list of nan values in #1368.
However, I'd also be happy to implement column-specific replacement of nans. Any suggestion on how to do this?
AdamRJensen commentedon Jan 9, 2022
I see that the CRN files contain the latitude and longitude of the stations. I suppose it would make sense to parse this and output it in a metadata dictionary?
I realize that outputing a tuple of
(data, meta)
instead of justdata
would be a breaking change, but it would be in line with the iotools pattern.wholmgren commentedon Jan 9, 2022
crn.WIDTHS
gets you half way there, but you'd need a corresponding list for the number of decimals allowed in each column. From there you can determine, for each column, how many 9s will fit between a-
and the space required for the decimals.Could be nice, but not sure how we'd handle the breaking change in a nice way. Let's address in a separate issue.
AdamRJensen commentedon Jan 10, 2022
It is possible to pass a dictionary to the
na_values
argument inpd.read_fwf
:Although, currently
na_values
is set equal to['\x00\x00\x00\x00\x00\x00']
.However, they can be combined:
and then:
I tested this method and it passes all of the tests.
AdamRJensen commentedon Jan 11, 2022
Fun fact: there are a few CRN stations that have longitudes in the -99 range, but none that matches exactly:
