cdfread¤

cdfread ¤

Classes:

Name	Description
`CDF`	Read a CDF file into the CDF object. This object contains methods to load

CDF ¤

CDF(path: Union[str, Path], validate: bool = False, string_encoding: str = 'ascii', s3_read_method: int = 1)

Read a CDF file into the CDF object. This object contains methods to load the cdf file information, variable names, and values.

Example

>>> import cdflib
>>> cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')
>>> cdf_file.cdf_info()
>>> x = cdf_file.varget("NameOfVariable", startrec=0, endrec=150)

Parameters:

Name	Type	Description	Default
`path` ¤	`(Path, str)`	Path to CDF file. This can be a link to a file in an S3 bucket as well.	required
`validate` ¤	`bool`	If True, validate the MD5 checksum of the CDF file.	`False`
`string_encoding` ¤	`str`	The encoding used to read strings. Defaults to 'ascii', which is what the CDF internal format description prescribes as the encoding for character strings. Other encodings may have been used to create files however, and this keyword argument gives users the flexibility to read those files.	`'ascii'`
`s3_read_method` ¤	`int`	If the user is specifying a file that lives within an AWS S3 bucket, this variable defines how the file is read in. The choices are: - 1 will read the file into memory to load in memory) - 2 will download the file to a tmp directory - 3 reads the file in chunks directly from S3 over https	`1`

Notes

An open file handle to the CDF file remains whilst a CDF object is live. It is automatically cleaned up with the CDF instance is deleted.

Methods:

Name	Description
`attget`	Returns the value of the attribute at the entry number provided.
`attinq`	Get attribute information.
`cdf_info`	Returns basic CDF information.
`globalattsget`	Gets all global attributes.
`varattsget`	Gets all variable attributes.
`varget`	Returns the variable data.
`varinq`	Get basic variable information.

Source code in cdflib/cdfread.py

def __init__(self, path: Union[str, Path], validate: bool = False, string_encoding: str = "ascii", s3_read_method: int = 1):
    """
    Parameters
    ----------
    path : Path, str
        Path to CDF file.  This can be a link to a file in an S3 bucket as well.
    validate : bool, optional
        If True, validate the MD5 checksum of the CDF file.
    string_encoding : str, optional
        The encoding used to read strings. Defaults to 'ascii', which is what
        the CDF internal format description prescribes as the encoding for
        character strings. Other encodings may have been used to create files
        however, and this keyword argument gives users the flexibility to read
        those files.
    s3_read_method: int, optional
        If the user is specifying a file that lives within an AWS S3 bucket, this variable
        defines how the file is read in.  The choices are:
        - 1 will read the file into memory to load in memory)
        - 2 will download the file to a tmp directory
        - 3 reads the file in chunks directly from S3 over https

    Notes
    -----
    An open file handle to the CDF file remains whilst a CDF object is live.
    It is automatically cleaned up with the CDF instance is deleted.
    """
    if isinstance(path, Path):
        fname = path.absolute().as_posix()
    else:
        fname = path

    self.file: Union[str, Path]
    if fname.startswith("s3://"):
        # later put in s3 'does it exist' checker
        self.ftype = "s3"
        self.file = fname  # path for files, fname for urls and S3
    elif fname.startswith("http://") or fname.startswith("https://"):
        # later put in url 404 'does it exist' checker
        self.ftype = "url"
        self.file = fname  # path for files, fname for urls and S3
    else:
        self.ftype = "file"
        path = Path(path).resolve().expanduser()
        if not path.is_file():
            path = path.with_suffix(".cdf")
            if not path.is_file():
                raise FileNotFoundError(f"{path} not found")
        self.file = path  # path for files, fname for urls and S3
        self.file = path

    self.string_encoding = string_encoding

    self._f = self._file_or_url_or_s3_handler(str(self.file), self.ftype, s3_read_method)
    magic_number = self._f.read(4).hex()
    compressed_bool = self._f.read(4).hex()

    if magic_number not in ("cdf30001", "cdf26002", "0000ffff"):
        raise OSError(f"{path} is not a CDF file or a non-supported CDF!")

    self.cdfversion = 3 if magic_number == "cdf30001" else 2

    self._compressed = not (compressed_bool == "0000ffff")
    self.compressed_file = None
    self.temp_file: Optional[Path] = None

    if self._compressed:
        if self.ftype == "url" or self.ftype == "s3":
            if s3_read_method == 3:
                # extra step, read entire file
                self._f.seek(0)
                self._f = s3_fetchall(self._f.fhandle)  # type: ignore
            self._unstream_file(self._f)
            path = self.file
        self._uncompress_file()
        if self.temp_file is None:
            raise OSError("Decompression was unsuccessful.  Only GZIP compression is currently supported.")

        self.compressed_file = self.file
        self.file = self.temp_file
        self._f.close()
        self._f = self.file.open("rb")
        self.ftype = "file"

    if self.cdfversion == 3:
        cdr_info, foffs = self._read_cdr(8)
        gdr_info = self._read_gdr(foffs)
    else:
        cdr_info, foffs = self._read_cdr2(8)
        gdr_info = self._read_gdr2(foffs)

    if cdr_info.md5 and validate:
        if not self._md5_validation():
            raise OSError("This file fails the md5 checksum.")

    if not cdr_info.format_:
        raise OSError("This package does not support multi-format CDF")

    if cdr_info.encoding in (3, 14, 15):
        raise OSError("This package does not support CDFs with this " + self._encoding_token(cdr_info.encoding) + " encoding")

    # SET GLOBAL VARIABLES
    self._post25 = cdr_info.post25
    self._version = cdr_info.version
    self._encoding = cdr_info.encoding
    self._majority = self._major_token(cdr_info.majority)
    self._copyright = cdr_info.copyright_
    self._md5 = cdr_info.md5
    self._first_zvariable = gdr_info.first_zvariable
    self._first_rvariable = gdr_info.first_rvariable
    self._first_adr = gdr_info.first_adr
    self._num_zvariable = gdr_info.num_zvariables
    self._num_rvariable = gdr_info.num_rvariables
    self._rvariables_num_dims = gdr_info.rvariables_num_dims
    self._rvariables_dim_sizes = gdr_info.rvariables_dim_sizes
    self._num_att = gdr_info.num_attributes
    self._num_rdim = gdr_info.rvariables_num_dims
    self._rdim_sizes = gdr_info.rvariables_dim_sizes
    if self.cdfversion == 3:
        self._leap_second_updated = gdr_info.leapsecond_updated

    if self.compressed_file is not None:
        self.compressed_file = None

attget ¤

attget(attribute: Union[str, int], entry: Optional[Union[str, int]] = None) -> AttData

Returns the value of the attribute at the entry number provided.

A variable name can be used instead of its corresponding entry number.

Parameters:

Name	Type	Description	Default
`attribute` ¤	`(str, int)`	Attribute name or number to get.	required
`entry` ¤	`int`		`None`

Returns:

Type	Description
`AttData`

Source code in cdflib/cdfread.py

def attget(self, attribute: Union[str, int], entry: Optional[Union[str, int]] = None) -> AttData:
    """
    Returns the value of the attribute at the entry number provided.

    A variable name can be used instead of its corresponding
    entry number.

    Parameters
    ----------
    attribute : str, int
        Attribute name or number to get.
    entry : int, optional

    Returns
    -------
    AttData
    """
    # Starting position
    position = self._first_adr

    # Get Correct ADR
    adr_info = None
    if isinstance(attribute, str):
        for _ in range(0, self._num_att):
            name, next_adr = self._read_adr_fast(position)
            if name.strip().lower() == attribute.strip().lower():
                adr_info = self._read_adr(position)
                if isinstance(entry, str) and adr_info.scope == 1:
                    # If the user has specified a string entry, they are obviously looking for a variable attribute.
                    # Filter out any global attributes that may have the same name.
                    adr_info = None
                    position = next_adr
                    continue
                break
            else:
                position = next_adr

        if adr_info is None:
            raise KeyError(f"No attribute {attribute} for entry {entry}")

    elif isinstance(attribute, int):
        if (attribute < 0) or (attribute > self._num_att):
            raise KeyError(f"No attribute {attribute}")
        if not isinstance(entry, int):
            raise TypeError(f"{entry} has to be a number.")

        for _ in range(0, attribute):
            name, next_adr = self._read_adr_fast(position)
            position = next_adr
        adr_info = self._read_adr(position)
    else:
        raise ValueError("Please set attribute keyword equal to " "the name or number of an attribute")

    # Find the correct entry from the "entry" variable
    if adr_info.scope == 1:
        if not isinstance(entry, int):
            raise ValueError('"entry" must be an integer')
        num_entry_string = "num_gr_entry"
        first_entry_string = "first_gr_entry"
        max_entry_string = "max_gr_entry"
        entry_num = entry
    else:
        var_num = -1
        zvar = False
        if isinstance(entry, str):
            # a zVariable?
            positionx = self._first_zvariable
            for x in range(0, self._num_zvariable):
                name, vdr_next = self._read_vdr_fast(positionx)
                if name.strip().lower() == entry.strip().lower():
                    var_num = x
                    zvar = True
                    break
                positionx = vdr_next
            if var_num == -1:
                # a rVariable?
                positionx = self._first_rvariable
                for x in range(0, self._num_rvariable):
                    name, vdr_next = self._read_vdr_fast(positionx)
                    if name.strip().lower() == entry.strip().lower():
                        var_num = x
                        break
                    positionx = vdr_next
            if var_num == -1:
                raise ValueError(f"No variable by this name: {entry}")
            entry_num = var_num
        else:
            if self._num_zvariable > 0 and self._num_rvariable > 0:
                raise ValueError("This CDF has both r and z variables. " "Use variable name instead")
            if self._num_zvariable > 0:
                zvar = True
            entry_num = entry
        if zvar:
            num_entry_string = "num_z_entry"
            first_entry_string = "first_z_entry"
            max_entry_string = "max_z_entry"
        else:
            num_entry_string = "num_gr_entry"
            first_entry_string = "first_gr_entry"
            max_entry_string = "max_gr_entry"
    if entry_num > getattr(adr_info, max_entry_string):
        raise ValueError("The entry does not exist")
    return self._get_attdata(adr_info, entry_num, getattr(adr_info, num_entry_string), getattr(adr_info, first_entry_string))

attinq ¤

attinq(attribute: Union[str, int]) -> ADRInfo

Get attribute information.

Parameters:

Name	Type	Description	Default
`attribute` ¤	`(str, int)`	Attribute to get information for.	required

Returns:

Type	Description
`ADRInfo`

Source code in cdflib/cdfread.py

def attinq(self, attribute: Union[str, int]) -> ADRInfo:
    """
    Get attribute information.

    Parameters
    ----------
    attribute : str, int
        Attribute to get information for.

    Returns
    -------
    ADRInfo
    """
    position = self._first_adr
    if isinstance(attribute, str):
        for _ in range(0, self._num_att):
            name, next_adr = self._read_adr_fast(position)
            if name.strip().lower() == attribute.strip().lower():
                return self._read_adr(position)

            position = next_adr
        raise KeyError(f"No attribute {attribute}")

    elif isinstance(attribute, int):
        if attribute < 0 or attribute > self._num_zvariable:
            raise KeyError(f"No attribute {attribute}")
        for _ in range(0, attribute):
            name, next_adr = self._read_adr_fast(position)
            position = next_adr

        return self._read_adr(position)
    else:
        raise ValueError("attribute keyword must be a string or integer")

cdf_info ¤

cdf_info() -> CDFInfo

Returns basic CDF information.

Returns:

Type	Description
`CDFInfo`

Source code in cdflib/cdfread.py

def cdf_info(self) -> CDFInfo:
    """
    Returns basic CDF information.

    Returns
    -------
    CDFInfo
    """
    varnames = self._get_varnames()
    return CDFInfo(
        self.file,
        self._version,
        self._encoding,
        self._majority,
        varnames[0],
        varnames[1],
        self._get_attnames(),
        self._copyright,
        self._md5,
        self._num_rdim,
        self._rdim_sizes,
        self._compressed,
    )

globalattsget ¤

globalattsget() -> Dict[str, List[Union[str, ndarray]]]

Gets all global attributes.

This function returns all of the global attribute entries, in a dictionary (in the form of 'attribute': {entry: value} pairs) from a CDF.

Source code in cdflib/cdfread.py

def globalattsget(self) -> Dict[str, List[Union[str, np.ndarray]]]:
    """
    Gets all global attributes.

    This function returns all of the global attribute entries,
    in a dictionary (in the form of ``'attribute': {entry: value}``
    pairs) from a CDF.
    """
    byte_loc = self._first_adr
    return_dict: Dict[str, List[Union[str, np.ndarray]]] = {}
    for _ in range(self._num_att):
        adr_info = self._read_adr(byte_loc)
        if adr_info.scope != 1:
            byte_loc = adr_info.next_adr_loc
            continue
        if adr_info.num_gr_entry == 0:
            byte_loc = adr_info.next_adr_loc
            continue
        entries = []
        aedr_byte_loc = adr_info.first_gr_entry
        for _ in range(adr_info.num_gr_entry):
            aedr_info = self._read_aedr(aedr_byte_loc)
            entryData = aedr_info.entry
            # This exists to get rid of extraneous numpy arrays
            if isinstance(entryData, np.ndarray):
                if len(entryData) == 1:
                    entryData = entryData[0]

            entries.append(entryData)
            aedr_byte_loc = aedr_info.next_aedr

        return_dict[adr_info.name] = entries
        byte_loc = adr_info.next_adr_loc

    return return_dict

varattsget ¤

varattsget(variable: Union[str, int]) -> Dict[str, Union[None, str, ndarray]]

Gets all variable attributes.

Unlike attget, which returns a single attribute entry value, this function returns all of the variable attribute entries, in a dictionary (in the form of 'attribute': value pair) for a variable.

Source code in cdflib/cdfread.py

def varattsget(self, variable: Union[str, int]) -> Dict[str, Union[None, str, np.ndarray]]:
    """
    Gets all variable attributes.

    Unlike attget, which returns a single attribute entry value,
    this function returns all of the variable attribute entries,
    in a dictionary (in the form of 'attribute': value pair) for
    a variable.
    """
    if isinstance(variable, int) and self._num_zvariable > 0 and self._num_rvariable > 0:
        raise ValueError("This CDF has both r and z variables. Use variable name")
    if isinstance(variable, str):
        position = self._first_zvariable
        num_variables = self._num_zvariable
        for zVar in [True, False]:
            for _ in range(0, num_variables):
                name, vdr_next = self._read_vdr_fast(position)
                if name.strip().lower() == variable.strip().lower():
                    vdr_info = self._read_vdr(position)
                    return self._read_varatts(vdr_info.variable_number, zVar)
                position = vdr_next
            position = self._first_rvariable
            num_variables = self._num_rvariable
        raise ValueError(f"No variable by this name: {variable}")
    elif isinstance(variable, int):
        if self._num_zvariable > 0:
            num_variable = self._num_zvariable
            zVar = True
        else:
            num_variable = self._num_rvariable
            zVar = False
        if variable < 0 or variable >= num_variable:
            raise ValueError(f"No variable by this number: {variable}")
        return self._read_varatts(variable, zVar)

varget ¤

varget(variable: Optional[str] = None, epoch: Optional[str] = None, starttime: Optional[epoch_types] = None, endtime: Optional[epoch_types] = None, startrec: int = 0, endrec: Optional[int] = None) -> Union[str, ndarray]

Returns the variable data.

Parameters:

Name	Type	Description	Default
`variable` ¤	`Optional[str]`	Variable name to fetch.	`None`
`startrec` ¤	`int`	Index of the first record to get.	`0`
`endrec` ¤	`int`	Index of the last record to get. All records from startrec to endrec inclusive are fetched.	`None`

Notes

Variable can be entered either a name or a variable number. By default, it returns a 'numpy.ndarray' or 'list' class object, depending on the data type, with the variable data and its specification.

By default, the full variable data is returned. To acquire only a portion of the data for a record-varying variable, either the time or record (0-based) range can be specified. 'epoch' can be used to specify which time variable this variable depends on and is to be searched for the time range. For the ISTP-compliant CDFs, the time variable will come from the attribute 'DEPEND_0' from this variable. The function will automatically search for it thus no need to specify 'epoch'. If either the start or end time is not specified, the possible minimum or maximum value for the specific epoch data type is assumed. If either the start or end record is not specified, the range starts at 0 or/and ends at the last of the written data.

The start (and end) time should be presented in a list as: [year month day hour minute second millisec] for CDF_EPOCH [year month day hour minute second millisec microsec nanosec picosec] for CDF_EPOCH16 [year month day hour minute second millisec microsec nanosec] for CDF_TIME_TT2000 If not enough time components are presented, only the last item can have the floating portion for the sub-time components.

Note: CDF's CDF_EPOCH16 data type uses 2 8-byte doubles for each data value. In Python, each value is presented as a complex or numpy.complex128.

Source code in cdflib/cdfread.py

def varget(
    self,
    variable: Optional[str] = None,
    epoch: Optional[str] = None,
    starttime: Optional[epoch.epoch_types] = None,
    endtime: Optional[epoch.epoch_types] = None,
    startrec: int = 0,
    endrec: Optional[int] = None,
) -> Union[str, np.ndarray]:
    """
    Returns the variable data.

    Parameters
    ----------
    variable: str
        Variable name to fetch.
    startrec: int
        Index of the first record to get.
    endrec : int
        Index of the last record to get. All records from *startrec* to
        *endrec* inclusive are fetched.

    Notes
    -----
    Variable can be entered either
    a name or a variable number. By default, it returns a
    'numpy.ndarray' or 'list' class object, depending on the
    data type, with the variable data and its specification.

    By default, the full variable data is returned. To acquire
    only a portion of the data for a record-varying variable,
    either the time or record (0-based) range can be specified.
    'epoch' can be used to specify which time variable this
    variable depends on and is to be searched for the time range.
    For the ISTP-compliant CDFs, the time variable will come from
    the attribute 'DEPEND_0' from this variable. The function will
    automatically search for it thus no need to specify 'epoch'.
    If either the start or end time is not specified,
    the possible minimum or maximum value for the specific epoch
    data type is assumed. If either the start or end record is not
    specified, the range starts at 0 or/and ends at the last of the
    written data.

    The start (and end) time should be presented in a list as:
    [year month day hour minute second millisec] for CDF_EPOCH
    [year month day hour minute second millisec microsec nanosec picosec] for CDF_EPOCH16
    [year month day hour minute second millisec microsec nanosec] for CDF_TIME_TT2000
    If not enough time components are presented, only the last item can have the floating
    portion for the sub-time components.

    Note: CDF's CDF_EPOCH16 data type uses 2 8-byte doubles for each data value.
    In Python, each value is presented as a complex or numpy.complex128.
    """
    if isinstance(variable, int) and self._num_zvariable > 0 and self._num_rvariable > 0:
        raise ValueError("This CDF has both r and z variables. " "Use variable name instead")

    if (starttime is not None or endtime is not None) and (startrec != 0 or endrec is not None):
        raise ValueError("Can't specify both time and record range")

    vdr_info = self.vdr_info(variable)
    if vdr_info.max_rec < 0:
        raise ValueError(f"No records found for variable {variable}")

    return self._read_vardata(
        vdr_info,
        epoch=epoch,
        starttime=starttime,
        endtime=endtime,
        startrec=startrec,
        endrec=endrec,
    )

varinq ¤

varinq(variable: str) -> VDRInfo

Get basic variable information.

Returns:

Type	Description
`VDRInfo`

Source code in cdflib/cdfread.py

def varinq(self, variable: str) -> VDRInfo:
    """
    Get basic variable information.

    Returns
    -------
    VDRInfo
    """
    vdr_info = self.vdr_info(variable)

    return VDRInfo(
        vdr_info.name,
        vdr_info.variable_number,
        self._variable_token(vdr_info.section_type),
        vdr_info.data_type,
        self._datatype_token(vdr_info.data_type),
        vdr_info.num_elements,
        vdr_info.num_dims,
        vdr_info.dim_sizes,
        self._sparse_token(vdr_info.sparse),
        vdr_info.max_rec,
        vdr_info.record_vary,
        vdr_info.dim_vary,
        vdr_info.compression_level,
        vdr_info.pad,
        vdr_info.blocking_factor,
    )

Sample Usage¤

To begin accessing the data within a CDF file, first create a new CDF class. This can be done with the following commands

>>> import cdflib
>>> cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')

Then, you can call various functions on the variable.

For example

>>> x = cdf_file.varget("NameOfVariable", startrec = 0, endrec = 150)

This command will return all data inside of the variable Variable1, from records 0 to 150.

cdfread¤

cdfread ¤

CDF ¤

`path` ¤

`validate` ¤

`string_encoding` ¤

`s3_read_method` ¤

attget ¤

`attribute` ¤

`entry` ¤

attinq ¤

`attribute` ¤

cdf_info ¤

globalattsget ¤

varattsget ¤

varget ¤

`variable` ¤

`startrec` ¤

`endrec` ¤

varinq ¤

Sample Usage¤

cdfread¤

cdfread ¤

CDF ¤

path ¤

validate ¤

string_encoding ¤

s3_read_method ¤

attget ¤

attribute ¤

entry ¤

attinq ¤

attribute ¤

cdf_info ¤

globalattsget ¤

varattsget ¤

varget ¤

variable ¤

startrec ¤

endrec ¤

varinq ¤

Sample Usage¤

`path` ¤

`validate` ¤

`string_encoding` ¤

`s3_read_method` ¤

`attribute` ¤

`entry` ¤

`attribute` ¤

`variable` ¤

`startrec` ¤

`endrec` ¤