fastavro.read¶

class reader(fo: Union[IO, fastavro.io.json_decoder.AvroJSONDecoder], reader_schema: Union[str, List[T], Dict[KT, VT], None] = None, return_record_name: bool = False, return_record_name_override: bool = False, handle_unicode_errors: str = 'strict', return_named_type: bool = False, return_named_type_override: bool = False)¶

Iterator over records in an avro file.

Parameters:

fo – File-like object to read from
reader_schema – Reader schema
return_record_name – If true, when reading a union of records, the result will be a tuple where the first value is the name of the record and the second value is the record itself
return_record_name_override – If true, this will modify the behavior of return_record_name so that the record name is only returned for unions where there is more than one record. For unions that only have one record, this option will make it so that the record is returned by itself, not a tuple with the name.
return_named_type – If true, when reading a union of named types, the result will be a tuple where the first value is the name of the type and the second value is the record itself NOTE: Using this option will ignore return_record_name and return_record_name_override
return_named_type_override – If true, this will modify the behavior of return_named_type so that the named type is only returned for unions where there is more than one named type. For unions that only have one named type, this option will make it so that the named type is returned by itself, not a tuple with the name
handle_unicode_errors – Default strict. Should be set to a valid string that can be used in the errors argument of the string decode() function. Examples include replace and ignore

Example:

from fastavro import reader
with open('some-file.avro', 'rb') as fo:
    avro_reader = reader(fo)
    for record in avro_reader:
        process_record(record)

The fo argument is a file-like object so another common example usage would use an io.BytesIO object like so:

from io import BytesIO
from fastavro import writer, reader

fo = BytesIO()
writer(fo, schema, records)
fo.seek(0)
for record in reader(fo):
    process_record(record)

metadata¶: Key-value pairs in the header metadata

codec¶: The codec used when writing

writer_schema¶: The schema used when writing

reader_schema¶: The schema used when reading (if provided)

class block_reader(fo: IO, reader_schema: Union[str, List[T], Dict[KT, VT], None] = None, return_record_name: bool = False, return_record_name_override: bool = False, handle_unicode_errors: str = 'strict', return_named_type: bool = False, return_named_type_override: bool = False)¶

Iterator over Block in an avro file.

Parameters:

fo – Input stream
reader_schema – Reader schema
return_record_name – If true, when reading a union of records, the result will be a tuple where the first value is the name of the record and the second value is the record itself
return_record_name_override – If true, this will modify the behavior of return_record_name so that the record name is only returned for unions where there is more than one record. For unions that only have one record, this option will make it so that the record is returned by itself, not a tuple with the name.
return_named_type – If true, when reading a union of named types, the result will be a tuple where the first value is the name of the type and the second value is the record itself NOTE: Using this option will ignore return_record_name and return_record_name_override
return_named_type_override – If true, this will modify the behavior of return_named_type so that the named type is only returned for unions where there is more than one named type. For unions that only have one named type, this option will make it so that the named type is returned by itself, not a tuple with the name
handle_unicode_errors – Default strict. Should be set to a valid string that can be used in the errors argument of the string decode() function. Examples include replace and ignore

Example:

from fastavro import block_reader
with open('some-file.avro', 'rb') as fo:
    avro_reader = block_reader(fo)
    for block in avro_reader:
        process_block(block)

metadata¶: Key-value pairs in the header metadata

codec¶: The codec used when writing

writer_schema¶: The schema used when writing

reader_schema¶: The schema used when reading (if provided)

class Block(bytes_, num_records, codec, reader_schema, writer_schema, named_schemas, offset, size, options)¶

An avro block. Will yield records when iterated over

num_records¶: Number of records in the block

writer_schema¶: The schema used when writing

reader_schema¶: The schema used when reading (if provided)

offset¶: Offset of the block from the beginning of the avro file

size¶: Size of the block in bytes

schemaless_reader(fo: IO, writer_schema: Union[str, List[T], Dict[KT, VT]], reader_schema: Union[str, List[T], Dict[KT, VT], None] = None, return_record_name: bool = False, return_record_name_override: bool = False, handle_unicode_errors: str = 'strict', return_named_type: bool = False, return_named_type_override: bool = False) → Union[None, str, float, int, decimal.Decimal, bool, bytes, List[T], Dict[KT, VT]]¶

Reads a single record written using the schemaless_writer()

Parameters:

fo – Input stream
writer_schema – Schema used when calling schemaless_writer
reader_schema – If the schema has changed since being written then the new schema can be given to allow for schema migration
return_record_name – If true, when reading a union of records, the result will be a tuple where the first value is the name of the record and the second value is the record itself
return_record_name_override – If true, this will modify the behavior of return_record_name so that the record name is only returned for unions where there is more than one record. For unions that only have one record, this option will make it so that the record is returned by itself, not a tuple with the name.
return_named_type – If true, when reading a union of named types, the result will be a tuple where the first value is the name of the type and the second value is the record itself NOTE: Using this option will ignore return_record_name and return_record_name_override
return_named_type_override – If true, this will modify the behavior of return_named_type so that the named type is only returned for unions where there is more than one named type. For unions that only have one named type, this option will make it so that the named type is returned by itself, not a tuple with the name
handle_unicode_errors – Default strict. Should be set to a valid string that can be used in the errors argument of the string decode() function. Examples include replace and ignore

Example:

parsed_schema = fastavro.parse_schema(schema)
with open('file', 'rb') as fp:
    record = fastavro.schemaless_reader(fp, parsed_schema)

Note: The schemaless_reader can only read a single record.

is_avro(path_or_buffer: Union[str, IO]) → bool¶

Return True if path (or buffer) points to an Avro file. This will only work for avro files that contain the normal avro schema header like those create from writer(). This function is not intended to be used with binary data created from schemaless_writer() since that does not include the avro header.

Parameters:	path_or_buffer – Path to file

fastavro.read¶

fastavro

Navigation

Related Topics