Skip to content

cdfread¤

cdfread ¤

Classes:

Name Description
CDF

Read a CDF file into the CDF object. This object contains methods to load

CDF ¤

CDF(path: Union[str, Path], validate: bool = False, string_encoding: str = 'ascii', s3_read_method: int = 1)

Read a CDF file into the CDF object. This object contains methods to load the cdf file information, variable names, and values.

Example
>>> import cdflib
>>> cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')
>>> cdf_file.cdf_info()
>>> x = cdf_file.varget("NameOfVariable", startrec=0, endrec=150)

Parameters:

Name Type Description Default

path ¤

(Path, str)

Path to CDF file. This can be a link to a file in an S3 bucket as well.

required

validate ¤

bool

If True, validate the MD5 checksum of the CDF file.

False

string_encoding ¤

str

The encoding used to read strings. Defaults to 'ascii', which is what the CDF internal format description prescribes as the encoding for character strings. Other encodings may have been used to create files however, and this keyword argument gives users the flexibility to read those files.

'ascii'

s3_read_method ¤

int

If the user is specifying a file that lives within an AWS S3 bucket, this variable defines how the file is read in. The choices are: - 1 will read the file into memory to load in memory) - 2 will download the file to a tmp directory - 3 reads the file in chunks directly from S3 over https

1
Notes

An open file handle to the CDF file remains whilst a CDF object is live. It is automatically cleaned up with the CDF instance is deleted.

Methods:

Name Description
attget

Returns the value of the attribute at the entry number provided.

attinq

Get attribute information.

cdf_info

Returns basic CDF information.

globalattsget

Gets all global attributes.

varattsget

Gets all variable attributes.

varget

Returns the variable data.

varinq

Get basic variable information.

Source code in cdflib/cdfread.py
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
def __init__(self, path: Union[str, Path], validate: bool = False, string_encoding: str = "ascii", s3_read_method: int = 1):
    """
    Parameters
    ----------
    path : Path, str
        Path to CDF file.  This can be a link to a file in an S3 bucket as well.
    validate : bool, optional
        If True, validate the MD5 checksum of the CDF file.
    string_encoding : str, optional
        The encoding used to read strings. Defaults to 'ascii', which is what
        the CDF internal format description prescribes as the encoding for
        character strings. Other encodings may have been used to create files
        however, and this keyword argument gives users the flexibility to read
        those files.
    s3_read_method: int, optional
        If the user is specifying a file that lives within an AWS S3 bucket, this variable
        defines how the file is read in.  The choices are:
        - 1 will read the file into memory to load in memory)
        - 2 will download the file to a tmp directory
        - 3 reads the file in chunks directly from S3 over https

    Notes
    -----
    An open file handle to the CDF file remains whilst a CDF object is live.
    It is automatically cleaned up with the CDF instance is deleted.
    """
    if isinstance(path, Path):
        fname = path.absolute().as_posix()
    else:
        fname = path

    self.file: Union[str, Path]
    if fname.startswith("s3://"):
        # later put in s3 'does it exist' checker
        self.ftype = "s3"
        self.file = fname  # path for files, fname for urls and S3
    elif fname.startswith("http://") or fname.startswith("https://"):
        # later put in url 404 'does it exist' checker
        self.ftype = "url"
        self.file = fname  # path for files, fname for urls and S3
    else:
        self.ftype = "file"
        path = Path(path).resolve().expanduser()
        if not path.is_file():
            path = path.with_suffix(".cdf")
            if not path.is_file():
                raise FileNotFoundError(f"{path} not found")
        self.file = path  # path for files, fname for urls and S3
        self.file = path

    self.string_encoding = string_encoding

    self._f = self._file_or_url_or_s3_handler(str(self.file), self.ftype, s3_read_method)
    magic_number = self._f.read(4).hex()
    compressed_bool = self._f.read(4).hex()

    if magic_number not in ("cdf30001", "cdf26002", "0000ffff"):
        raise OSError(f"{path} is not a CDF file or a non-supported CDF!")

    self.cdfversion = 3 if magic_number == "cdf30001" else 2

    self._compressed = not (compressed_bool == "0000ffff")
    self.compressed_file = None
    self.temp_file: Optional[Path] = None

    if self._compressed:
        if self.ftype == "url" or self.ftype == "s3":
            if s3_read_method == 3:
                # extra step, read entire file
                self._f.seek(0)
                self._f = s3_fetchall(self._f.fhandle)  # type: ignore
            self._unstream_file(self._f)
            path = self.file
        self._uncompress_file()
        if self.temp_file is None:
            raise OSError("Decompression was unsuccessful.  Only GZIP compression is currently supported.")

        self.compressed_file = self.file
        self.file = self.temp_file
        self._f.close()
        self._f = self.file.open("rb")
        self.ftype = "file"

    if self.cdfversion == 3:
        cdr_info, foffs = self._read_cdr(8)
        gdr_info = self._read_gdr(foffs)
    else:
        cdr_info, foffs = self._read_cdr2(8)
        gdr_info = self._read_gdr2(foffs)

    if cdr_info.md5 and validate:
        if not self._md5_validation():
            raise OSError("This file fails the md5 checksum.")

    if not cdr_info.format_:
        raise OSError("This package does not support multi-format CDF")

    if cdr_info.encoding in (3, 14, 15):
        raise OSError("This package does not support CDFs with this " + self._encoding_token(cdr_info.encoding) + " encoding")

    # SET GLOBAL VARIABLES
    self._post25 = cdr_info.post25
    self._version = cdr_info.version
    self._encoding = cdr_info.encoding
    self._majority = self._major_token(cdr_info.majority)
    self._copyright = cdr_info.copyright_
    self._md5 = cdr_info.md5
    self._first_zvariable = gdr_info.first_zvariable
    self._first_rvariable = gdr_info.first_rvariable
    self._first_adr = gdr_info.first_adr
    self._num_zvariable = gdr_info.num_zvariables
    self._num_rvariable = gdr_info.num_rvariables
    self._rvariables_num_dims = gdr_info.rvariables_num_dims
    self._rvariables_dim_sizes = gdr_info.rvariables_dim_sizes
    self._num_att = gdr_info.num_attributes
    self._num_rdim = gdr_info.rvariables_num_dims
    self._rdim_sizes = gdr_info.rvariables_dim_sizes
    if self.cdfversion == 3:
        self._leap_second_updated = gdr_info.leapsecond_updated

    if self.compressed_file is not None:
        self.compressed_file = None

attget ¤

attget(attribute: Union[str, int], entry: Optional[Union[str, int]] = None) -> AttData

Returns the value of the attribute at the entry number provided.

A variable name can be used instead of its corresponding entry number.

Parameters:

Name Type Description Default

attribute ¤

(str, int)

Attribute name or number to get.

required

entry ¤

int
None

Returns:

Type Description
AttData
Source code in cdflib/cdfread.py
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
def attget(self, attribute: Union[str, int], entry: Optional[Union[str, int]] = None) -> AttData:
    """
    Returns the value of the attribute at the entry number provided.

    A variable name can be used instead of its corresponding
    entry number.

    Parameters
    ----------
    attribute : str, int
        Attribute name or number to get.
    entry : int, optional

    Returns
    -------
    AttData
    """
    # Starting position
    position = self._first_adr

    # Get Correct ADR
    adr_info = None
    if isinstance(attribute, str):
        for _ in range(0, self._num_att):
            name, next_adr = self._read_adr_fast(position)
            if name.strip().lower() == attribute.strip().lower():
                adr_info = self._read_adr(position)
                if isinstance(entry, str) and adr_info.scope == 1:
                    # If the user has specified a string entry, they are obviously looking for a variable attribute.
                    # Filter out any global attributes that may have the same name.
                    adr_info = None
                    position = next_adr
                    continue
                break
            else:
                position = next_adr

        if adr_info is None:
            raise KeyError(f"No attribute {attribute} for entry {entry}")

    elif isinstance(attribute, int):
        if (attribute < 0) or (attribute > self._num_att):
            raise KeyError(f"No attribute {attribute}")
        if not isinstance(entry, int):
            raise TypeError(f"{entry} has to be a number.")

        for _ in range(0, attribute):
            name, next_adr = self._read_adr_fast(position)
            position = next_adr
        adr_info = self._read_adr(position)
    else:
        raise ValueError("Please set attribute keyword equal to " "the name or number of an attribute")

    # Find the correct entry from the "entry" variable
    if adr_info.scope == 1:
        if not isinstance(entry, int):
            raise ValueError('"entry" must be an integer')
        num_entry_string = "num_gr_entry"
        first_entry_string = "first_gr_entry"
        max_entry_string = "max_gr_entry"
        entry_num = entry
    else:
        var_num = -1
        zvar = False
        if isinstance(entry, str):
            # a zVariable?
            positionx = self._first_zvariable
            for x in range(0, self._num_zvariable):
                name, vdr_next = self._read_vdr_fast(positionx)
                if name.strip().lower() == entry.strip().lower():
                    var_num = x
                    zvar = True
                    break
                positionx = vdr_next
            if var_num == -1:
                # a rVariable?
                positionx = self._first_rvariable
                for x in range(0, self._num_rvariable):
                    name, vdr_next = self._read_vdr_fast(positionx)
                    if name.strip().lower() == entry.strip().lower():
                        var_num = x
                        break
                    positionx = vdr_next
            if var_num == -1:
                raise ValueError(f"No variable by this name: {entry}")
            entry_num = var_num
        else:
            if self._num_zvariable > 0 and self._num_rvariable > 0:
                raise ValueError("This CDF has both r and z variables. " "Use variable name instead")
            if self._num_zvariable > 0:
                zvar = True
            entry_num = entry
        if zvar:
            num_entry_string = "num_z_entry"
            first_entry_string = "first_z_entry"
            max_entry_string = "max_z_entry"
        else:
            num_entry_string = "num_gr_entry"
            first_entry_string = "first_gr_entry"
            max_entry_string = "max_gr_entry"
    if entry_num > getattr(adr_info, max_entry_string):
        raise ValueError("The entry does not exist")
    return self._get_attdata(adr_info, entry_num, getattr(adr_info, num_entry_string), getattr(adr_info, first_entry_string))

attinq ¤

attinq(attribute: Union[str, int]) -> ADRInfo

Get attribute information.

Parameters:

Name Type Description Default

attribute ¤

(str, int)

Attribute to get information for.

required

Returns:

Type Description
ADRInfo
Source code in cdflib/cdfread.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
def attinq(self, attribute: Union[str, int]) -> ADRInfo:
    """
    Get attribute information.

    Parameters
    ----------
    attribute : str, int
        Attribute to get information for.

    Returns
    -------
    ADRInfo
    """
    position = self._first_adr
    if isinstance(attribute, str):
        for _ in range(0, self._num_att):
            name, next_adr = self._read_adr_fast(position)
            if name.strip().lower() == attribute.strip().lower():
                return self._read_adr(position)

            position = next_adr
        raise KeyError(f"No attribute {attribute}")

    elif isinstance(attribute, int):
        if attribute < 0 or attribute > self._num_zvariable:
            raise KeyError(f"No attribute {attribute}")
        for _ in range(0, attribute):
            name, next_adr = self._read_adr_fast(position)
            position = next_adr

        return self._read_adr(position)
    else:
        raise ValueError("attribute keyword must be a string or integer")

cdf_info ¤

cdf_info() -> CDFInfo

Returns basic CDF information.

Returns:

Type Description
CDFInfo
Source code in cdflib/cdfread.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def cdf_info(self) -> CDFInfo:
    """
    Returns basic CDF information.

    Returns
    -------
    CDFInfo
    """
    varnames = self._get_varnames()
    return CDFInfo(
        self.file,
        self._version,
        self._encoding,
        self._majority,
        varnames[0],
        varnames[1],
        self._get_attnames(),
        self._copyright,
        self._md5,
        self._num_rdim,
        self._rdim_sizes,
        self._compressed,
    )

globalattsget ¤

globalattsget() -> Dict[str, List[Union[str, ndarray]]]

Gets all global attributes.

This function returns all of the global attribute entries, in a dictionary (in the form of 'attribute': {entry: value} pairs) from a CDF.

Source code in cdflib/cdfread.py
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
def globalattsget(self) -> Dict[str, List[Union[str, np.ndarray]]]:
    """
    Gets all global attributes.

    This function returns all of the global attribute entries,
    in a dictionary (in the form of ``'attribute': {entry: value}``
    pairs) from a CDF.
    """
    byte_loc = self._first_adr
    return_dict: Dict[str, List[Union[str, np.ndarray]]] = {}
    for _ in range(self._num_att):
        adr_info = self._read_adr(byte_loc)
        if adr_info.scope != 1:
            byte_loc = adr_info.next_adr_loc
            continue
        if adr_info.num_gr_entry == 0:
            byte_loc = adr_info.next_adr_loc
            continue
        entries = []
        aedr_byte_loc = adr_info.first_gr_entry
        for _ in range(adr_info.num_gr_entry):
            aedr_info = self._read_aedr(aedr_byte_loc)
            entryData = aedr_info.entry
            # This exists to get rid of extraneous numpy arrays
            if isinstance(entryData, np.ndarray):
                if len(entryData) == 1:
                    entryData = entryData[0]

            entries.append(entryData)
            aedr_byte_loc = aedr_info.next_aedr

        return_dict[adr_info.name] = entries
        byte_loc = adr_info.next_adr_loc

    return return_dict

varattsget ¤

varattsget(variable: Union[str, int]) -> Dict[str, Union[None, str, ndarray]]

Gets all variable attributes.

Unlike attget, which returns a single attribute entry value, this function returns all of the variable attribute entries, in a dictionary (in the form of 'attribute': value pair) for a variable.

Source code in cdflib/cdfread.py
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
def varattsget(self, variable: Union[str, int]) -> Dict[str, Union[None, str, np.ndarray]]:
    """
    Gets all variable attributes.

    Unlike attget, which returns a single attribute entry value,
    this function returns all of the variable attribute entries,
    in a dictionary (in the form of 'attribute': value pair) for
    a variable.
    """
    if isinstance(variable, int) and self._num_zvariable > 0 and self._num_rvariable > 0:
        raise ValueError("This CDF has both r and z variables. Use variable name")
    if isinstance(variable, str):
        position = self._first_zvariable
        num_variables = self._num_zvariable
        for zVar in [True, False]:
            for _ in range(0, num_variables):
                name, vdr_next = self._read_vdr_fast(position)
                if name.strip().lower() == variable.strip().lower():
                    vdr_info = self._read_vdr(position)
                    return self._read_varatts(vdr_info.variable_number, zVar)
                position = vdr_next
            position = self._first_rvariable
            num_variables = self._num_rvariable
        raise ValueError(f"No variable by this name: {variable}")
    elif isinstance(variable, int):
        if self._num_zvariable > 0:
            num_variable = self._num_zvariable
            zVar = True
        else:
            num_variable = self._num_rvariable
            zVar = False
        if variable < 0 or variable >= num_variable:
            raise ValueError(f"No variable by this number: {variable}")
        return self._read_varatts(variable, zVar)

varget ¤

varget(variable: Optional[str] = None, epoch: Optional[str] = None, starttime: Optional[epoch_types] = None, endtime: Optional[epoch_types] = None, startrec: int = 0, endrec: Optional[int] = None) -> Union[str, ndarray]

Returns the variable data.

Parameters:

Name Type Description Default

variable ¤

Optional[str]

Variable name to fetch.

None

startrec ¤

int

Index of the first record to get.

0

endrec ¤

int

Index of the last record to get. All records from startrec to endrec inclusive are fetched.

None
Notes

Variable can be entered either a name or a variable number. By default, it returns a 'numpy.ndarray' or 'list' class object, depending on the data type, with the variable data and its specification.

By default, the full variable data is returned. To acquire only a portion of the data for a record-varying variable, either the time or record (0-based) range can be specified. 'epoch' can be used to specify which time variable this variable depends on and is to be searched for the time range. For the ISTP-compliant CDFs, the time variable will come from the attribute 'DEPEND_0' from this variable. The function will automatically search for it thus no need to specify 'epoch'. If either the start or end time is not specified, the possible minimum or maximum value for the specific epoch data type is assumed. If either the start or end record is not specified, the range starts at 0 or/and ends at the last of the written data.

The start (and end) time should be presented in a list as: [year month day hour minute second millisec] for CDF_EPOCH [year month day hour minute second millisec microsec nanosec picosec] for CDF_EPOCH16 [year month day hour minute second millisec microsec nanosec] for CDF_TIME_TT2000 If not enough time components are presented, only the last item can have the floating portion for the sub-time components.

Note: CDF's CDF_EPOCH16 data type uses 2 8-byte doubles for each data value. In Python, each value is presented as a complex or numpy.complex128.

Source code in cdflib/cdfread.py
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
def varget(
    self,
    variable: Optional[str] = None,
    epoch: Optional[str] = None,
    starttime: Optional[epoch.epoch_types] = None,
    endtime: Optional[epoch.epoch_types] = None,
    startrec: int = 0,
    endrec: Optional[int] = None,
) -> Union[str, np.ndarray]:
    """
    Returns the variable data.

    Parameters
    ----------
    variable: str
        Variable name to fetch.
    startrec: int
        Index of the first record to get.
    endrec : int
        Index of the last record to get. All records from *startrec* to
        *endrec* inclusive are fetched.

    Notes
    -----
    Variable can be entered either
    a name or a variable number. By default, it returns a
    'numpy.ndarray' or 'list' class object, depending on the
    data type, with the variable data and its specification.

    By default, the full variable data is returned. To acquire
    only a portion of the data for a record-varying variable,
    either the time or record (0-based) range can be specified.
    'epoch' can be used to specify which time variable this
    variable depends on and is to be searched for the time range.
    For the ISTP-compliant CDFs, the time variable will come from
    the attribute 'DEPEND_0' from this variable. The function will
    automatically search for it thus no need to specify 'epoch'.
    If either the start or end time is not specified,
    the possible minimum or maximum value for the specific epoch
    data type is assumed. If either the start or end record is not
    specified, the range starts at 0 or/and ends at the last of the
    written data.

    The start (and end) time should be presented in a list as:
    [year month day hour minute second millisec] for CDF_EPOCH
    [year month day hour minute second millisec microsec nanosec picosec] for CDF_EPOCH16
    [year month day hour minute second millisec microsec nanosec] for CDF_TIME_TT2000
    If not enough time components are presented, only the last item can have the floating
    portion for the sub-time components.

    Note: CDF's CDF_EPOCH16 data type uses 2 8-byte doubles for each data value.
    In Python, each value is presented as a complex or numpy.complex128.
    """
    if isinstance(variable, int) and self._num_zvariable > 0 and self._num_rvariable > 0:
        raise ValueError("This CDF has both r and z variables. " "Use variable name instead")

    if (starttime is not None or endtime is not None) and (startrec != 0 or endrec is not None):
        raise ValueError("Can't specify both time and record range")

    vdr_info = self.vdr_info(variable)
    if vdr_info.max_rec < 0:
        raise ValueError(f"No records found for variable {variable}")

    return self._read_vardata(
        vdr_info,
        epoch=epoch,
        starttime=starttime,
        endtime=endtime,
        startrec=startrec,
        endrec=endrec,
    )

varinq ¤

varinq(variable: str) -> VDRInfo

Get basic variable information.

Returns:

Type Description
VDRInfo
Source code in cdflib/cdfread.py
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
def varinq(self, variable: str) -> VDRInfo:
    """
    Get basic variable information.

    Returns
    -------
    VDRInfo
    """
    vdr_info = self.vdr_info(variable)

    return VDRInfo(
        vdr_info.name,
        vdr_info.variable_number,
        self._variable_token(vdr_info.section_type),
        vdr_info.data_type,
        self._datatype_token(vdr_info.data_type),
        vdr_info.num_elements,
        vdr_info.num_dims,
        vdr_info.dim_sizes,
        self._sparse_token(vdr_info.sparse),
        vdr_info.max_rec,
        vdr_info.record_vary,
        vdr_info.dim_vary,
        vdr_info.compression_level,
        vdr_info.pad,
        vdr_info.blocking_factor,
    )

Sample Usage¤

To begin accessing the data within a CDF file, first create a new CDF class. This can be done with the following commands

>>> import cdflib
>>> cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')

Then, you can call various functions on the variable.

For example

>>> x = cdf_file.varget("NameOfVariable", startrec = 0, endrec = 150)

This command will return all data inside of the variable Variable1, from records 0 to 150.