fastavro.write

writer(fo, schema, records, codec='null', sync_interval=16000, metadata=None, validator=None, sync_marker=None, codec_compression_level=None)

Write records to fo (stream) according to schema

Parameters:
  • fo (file-like) – Output stream
  • schema (dict) – Writer schema
  • records (iterable) – Records to write. This is commonly a list of the dictionary representation of the records, but it can be any iterable
  • codec (string, optional) – Compression codec, can be ‘null’, ‘deflate’ or ‘snappy’ (if installed)
  • sync_interval (int, optional) – Size of sync interval
  • metadata (dict, optional) – Header metadata
  • validator (None, True or a function) – Validator function. If None (the default) - no validation. If True then then fastavro.validation.validate will be used. If it’s a function, it should have the same signature as fastavro.writer.validate and raise an exeption on error.
  • sync_marker (bytes, optional) – A byte string used as the avro sync marker. If not provided, a random byte string will be used.
  • codec_compression_level (int, optional) – Compression level to use with the specified codec (if the codec supports it)

Example:

from fastavro import writer, parse_schema

schema = {
    'doc': 'A weather reading.',
    'name': 'Weather',
    'namespace': 'test',
    'type': 'record',
    'fields': [
        {'name': 'station', 'type': 'string'},
        {'name': 'time', 'type': 'long'},
        {'name': 'temp', 'type': 'int'},
    ],
}
parsed_schema = parse_schema(schema)

records = [
    {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
    {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
    {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
    {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

with open('weather.avro', 'wb') as out:
    writer(out, parsed_schema, records)

The fo argument is a file-like object so another common example usage would use an io.BytesIO object like so:

from io import BytesIO
from fastavro import writer

fo = BytesIO()
writer(fo, schema, records)

Given an existing avro file, it’s possible to append to it by re-opening the file in a+b mode. If the file is only opened in ab mode, we aren’t able to read some of the existing header information and an error will be raised. For example:

# Write initial records
with open('weather.avro', 'wb') as out:
    writer(out, parsed_schema, records)

# Write some more records
with open('weather.avro', 'a+b') as out:
    writer(out, parsed_schema, more_records)
schemaless_writer(fo, schema, record)

Write a single record without the schema or header information

Parameters:
  • fo (file-like) – Output file
  • schema (dict) – Schema
  • record (dict) – Record to write

Example:

parsed_schema = fastavro.parse_schema(schema)
with open('file.avro', 'rb') as fp:
    fastavro.schemaless_writer(fp, parsed_schema, record)

Note: The schemaless_writer can only write a single record.

Using the tuple notation to specify which branch of a union to take

Since this library uses plain dictionaries to reprensent a record, it is possible for that dictionary to fit the definition of two different records.

For example, given a dictionary like this:

{"name": "My Name"}

It would be valid against both of these records:

child_schema = {
    "name": "Child",
    "type": "record",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "favorite_color", "type": ["null", "string"]},
    ]
}

pet_schema = {
    "name": "Pet",
    "type": "record",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "favorite_toy", "type": ["null", "string"]},
    ]
}

This becomes a problem when a schema contains a union of these two similar records as it is not clear which record the dictionary represents. For example, if you used the previous dictionary with the following schema, it wouldn’t be clear if the record should be serialized as a Child or a `Pet:

household_schema = {
    "name": "Household",
    "type": "record",
    "fields": [
        {"name": "address", "type": "string"},
        {
            "name": "family_members",
            "type": {
                "type": "array", "items": [
                    {
                        "name": "Child",
                        "type": "record",
                        "fields": [
                            {"name": "name", "type": "string"},
                            {"name": "favorite_color", "type": ["null", "string"]},
                        ]
                    }, {
                        "name": "Pet",
                        "type": "record",
                        "fields": [
                            {"name": "name", "type": "string"},
                            {"name": "favorite_toy", "type": ["null", "string"]},
                        ]
                    }
                ]
            }
        },
    ]
}

To resolve this, you can use a tuple notation where the first value of the tuple is the fully namespaced record name and the second value is the dictionary. For example:

records = [
    {
        "address": "123 Drive Street",
        "family_members": [
            ("Child", {"name": "Son"}),
            ("Child", {"name": "Daughter"}),
            ("Pet", {"name": "Dog"}),
        ]
    }
]