Redesigning Zig IO Api
Input-output being one of the most fundamental systems in any programming language was probably one of the first that was designed in Zig’s standard library. As Zig grew and gained additional features it had a few redesigns but it is still not without issues so here I want to analyze how it is currently implemented and if we can now do better.
Current Zig IO API
We will start at the lowest level with std.fs.File representing a file on the file system and
std.net.Stream representing a network stream. Both of these have the following methods:
| |
They also have a close() method and methods to fetch their Reader and Writer objects which we
will explain later. File also has a lot of other methods that only make sense for a File but not
for a network stream. One set of such methods, that are of interest, are seeking methods and a
method to fetch a SeekableStream from a File. They are:
| |
std.io.SeekableStream is a generic struct that contains a pointer to the original stream and those
seekable methods that just call those same methods on that original stream. That way it acts as a
sort of interface. File, of course, defines its own instance of this type so that the pointer in
that instance is of type *File and same goes for std.net.Stream.
Reader and Writer
std.io.Reader and std.io.Writer are also generic structs that contain a pointer to original
stream but besides wrapping the read() and write() methods respectively they also provide
additional methods, or, in other words, additional behavior. Some examples for Reader are:
| |
And some examples for Writer are:
| |
So the Reader and Writer wrap an existing method and provide additional ones while
SeekableStream only wraps existing methods.
StreamSource
When you write some kind of loaders or parsers you often want to support two ways of doing it. One way lets the user just specify the file from which to parse the data and another one allows the user to load the data themselves and then provide the buffer from which to parse.
Reading a buffer as a stream is provided by a std.io.FixedBufferStream which provides read(),
write() and seek methods and methods to fetch the Reader, Writer and SeekableStream over
that buffer.
That way if some loader function accepts any reader you can pass it either File.reader() or
FixefBufferStream.reader() and it could work with both. Since those two readers are different type
instances of a generic type that function would need to accept anytype and thus be generic itself.
If it needs to store a reference to that reader then the entire type containing that function would
need to be generic. That can lead to a lot of generated code.
std.io.StreamSource exists to solve that issue for the most common case I explained above. It is a
union that can wrap either a buffer or a file and then provide one streaming API for both in the
form of already described read(), write() and seek methods and methods to fetch the Reader,
Writer and SeekableStream.
With it you can now write non generic loader functions that either accept the whole StreamSource
or just StreamSource.Reader and they will support reading from either a file or memory buffer, but
it will not, for example, support reading from std.net.Stream in any way.
Analysis
File and std.net.Stream wrap the functionality offered by the operating systems with a simple
and straightforward API so I don’t see some room for options there. Maybe some of the methods that
currently only exist on File could make sense for std.net.Stream as well, like readAll and
writeAll, but that is it.
Now Reader, Writer and SeekableStream seem a bit odd. They look like interfaces but they are
generic types, meaning each underlying stream that wants to provide them needs to define its own
specific type of those generics. That in turn means that any methods that want to receive any
Reader for example, would have to actually receive a parameter of type anytype and then just
assume or check that that type has all the Reader methods it needs. Same goes for the other two
interfaces.
Having SeekableStream as a separate struct doesn’t make sense from a usage point of view. You
never just need a seeking functionality. You need it in combination with reading or writing. If some
method needs a seekable reader it needs to receive anytype Reader and anytype
SeekableStream. Two things that actually refer to the same underlying stream.
Currently std.io.BufferedReader is implemented in a way where it can wrap any other Reader and
provide buffering additionally but if you then try to use it with a SeekableStream from the
original stream it will not work. BufferedReader itself doesn’t provide a way to do the seeking.
The only reason I see that it is now separated is the fact that some streams support seeking, like
File, while some like std.net.Stream don’t and there wasn’t an easy way to sometimes create a
Reader with and sometimes without seeking methods. Same goes for the Writer.
Another oddity of SeekableStream is that it doesn’t add any additional behavior. As far as I see
there is really no need for it at all in this form. If any method that needs a SeekableStream
actually needs to receive anytype and then see if it has seek methods we can always just provide
the original stream as a parameter since it will already have those methods. Currently they could be
called differently in the original stream since they are actually passed as comptime parameters to
generic SeekableStream struct but I saw no example where that was actually needed.
Another important problem of creating these Reader, Writer and SeekableStream abstractions
over different types of streams is that those streams often return different error sets from their
read(), write() and seek methods. Sometimes they don’t even return any error at all.
That is the main reason those abstractions need to remain generic. The only other option is to allow
methods in abstractions to return anyerror and thus lose information about specific possible
errors.
Alternatives
1. Just make SeekableStream a part of Reader and Writer
We can easily solve this with the help of mixins. If you don’t know what they are or how they work in Zig you can read my previous post.
The solution would look something like this for the Reader:
| |
Note that I am not passing seek methods explicitly like SeekableStream currently does since, as I
said, I didn’t find an example where they are called differently and can’t be used directly. If
there are other reasons to do that then we would need to go with the second option and pass another
six parameters that SeekableStream now receives.
If we don’t pass those methods we should probably add checks that Context does contain seek methods
and report a nice @compileError if it doesn’t.
Currently BufferedReader is defined like this:
| |
Just like above we can make BufferedSeekMethods mixin and mix it in only if ReaderType has seek
methods. That way if passed in ReaderType has seek methods so will its BufferedReader. That is
one problem solved.
Additionally this would make it easy to write the StreamSource so that it wraps
File.BufferedReader instead of File directly.
Another thing we can do is add something like this to the Reader:
| |
If some method wants to check if the anytype parameter passed to it is a Reader it can just
check if it @hasDecl(ReaderType, "readerInterfaceId") and the value of that field could maybe be
used in some @compileError messages. SeekMethods mixin could additionally add another
seekableIntefaceId so we can use the same method to check if it also has seek methods.
The other problems coming from these being generic types remain but they might be insignificant.
Migrating the standard library and existing code to this solution shouldn’t be too hard. Mostly it
would involve deleting SeekableStream and its usages and std lib isn’t using it anyway.
2. Use VTable interface like Allocator
Allocator is an interface to any concrete allocator implementation but it isn’t a generic type. It is a plain struct that contains an opaque pointer to the actual allocator and a vtable struct with pointers to internal wrapping methods that also receive an opaque pointer as first parameter. Some comptime magic is only used to generate those internal wrapping methods so they cast that first parameter to the type of each concrete allocator implementation before calling its own method.
Allocator struct then also provides additional common utility methods that use some of the basic three that must be provided by the implementation: alloc, free and resize. So this method still allows adding additional behavior.
Now all allocator implementations can only return one error from their alloc() method and that is
OutOfMemory. That allows this single Allocator struct to have a wrapping method that also can
return just this error and still be the common interface for any concrete implementation.
As already mentioned, different streams return different kinds of errors from their read, write
and seek methods so the only way this pattern could work is if the common interfaces return
anyerror. That also means that any library function that works with a Reader or Writer also
needs to return anyerror thus losing the ability to document and help its user properly handle its
specific errors.
EDIT:
I could not find a way to to make anything useful of this solution but @InKrypton made a comment that if we define a VTable that is generic only over errors something like this:
| |
That can enable us to write functions that need to receive any writer like this:
| |
So instead of receiving anytype we can actually say that it needs to receive a Writer and we
just need to specify what errors that Writer can return. This makes the API clearer and doesn’t
require any additional checks on what methods are available.
We should be able to still apply the solution from the previous point to merge SeekableStream into
Reader and Writer interfaces that are defined like this.
Migrating to this solution would require a bit more work but shouldn’t be complicated to do.
3. Make streams be Readers and Writers using mixins
Again if you don’t know what mixins are or how they are done in Zig you can read about it here.
In this approach there would not be a separate Reader, Writer and SeekableStream. There would
just be ReaderMethods and WriterMethods mixins that provide that common additional methods that
currently Reader and Writer provide. Something like this:
| |
File and std.net.Stream would now mixin those methods into their own implementation and File
would still have those additional seeking methods. Functions that need to receive ‘some seekable
reader’ for example, would provide one argument of type anytype and just like now check if all the
methods needed, both reading and seeking, are defined on the given type.
We could use the trick I mentioned in the first alternative, where ReaderMethods also define
| |
so that checking if some ReaderType: anytype is a reader can be done like this
| |
For that purpose we could also define SeekerMethods() mixin that only checks if the given Self
type has all the seek methods and only mixes in
| |
and doesn’t add any new methods.
BufferedReader would provide its own read() method that wraps the original one with additional
logic just like now and would also mixin ReaderMethods. If the passed in reader has seek methods
it could also wrap those using something like this:
| |
In this approach there is even less indirection than in the current implementation and it is also easy to implement and use. Methods that receive them as arguments being generic as in the current solution would also generate similar amounts of code.
One issue with this approach is that the struct that wants to use these mixins must have a read method with this exact signature:
| |
Same goes for the write() method. Both File and std.net.Stream do have that but currently in
std lib we also have:
std.fifo.LinearFifowhich has areadmethod whose return type is justusizeand not an error union and so for the currentReaderit defines a separate method calledreadFnthat just callsreadbut returns an error union with empty error set like thiserror{}!usize.std.os.uefi.protocols.FileProtocolthat has areadmethod with a bit different signature and then it also defines a separatereadFnmethod for theReaderthat has the proper signature and marshals the call to the actualread.
Note that the reason they must have a read() method that returns error union is not the check we
added in ReaderMethods mixin but the fact that code that uses “any reader” might always call try reader.read(...) and the try statement will not compile unless read() returns an error union.
I am not sure how acceptable it would be to change the main read() method in those structs to
align with the Reader interface and then provide their current read() method under a different
name.
Migrating existing code to this alternative also wouldn’t be too hard and would mostly involve
deleting code. We would need to delete all reader(), writer() and seekableStream() methods and
probably just replace their calls with the object they were called on. For example:
| |
StreamSource in this alternative would still wrap read(), write() and seek methods just like
now and also mixin the ReadMethods, WriteMethods and SeekMethods.
Conclusion
Although the second alternative looked to be unusable the @InKrypton’s suggestion made it a valid solution. This solution probably does the most to make the APIs of the functions as clear as possible. It has a bit more indirection and it did’t completely remove generics but it made them more ergonomic.
The first alternative that just merges SeekableStream into Reader and Writer seems to me like
a clear improvement with no downsides over the existing solution. I think that part is something we
should do no matter what we decide on the rest of suggestions.
Personally I like the third alternative the most. It gives the most potential for clearing up both
the stream implementations and their usage code. I am just not sure how acceptable it is to require
that every stream implements the exact read() and write() methods that are required by the
interfaces.
What do you think? Do you have some further arguments to provide over some solution? Or maybe you have an idea for some new solution? Join us on Ziggit forum and share your opinion.