From: Mathieu Desnoyers on
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
---
Documentation/ring-buffer/ring-buffer-design.txt | 78 ++++++
Documentation/ring-buffer/ring-buffer-usage.txt | 260 +++++++++++++++++++++++
2 files changed, 338 insertions(+)

Index: linux.trees.git/Documentation/ring-buffer/ring-buffer-design.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/Documentation/ring-buffer/ring-buffer-design.txt 2010-07-02 12:34:02.000000000 -0400
@@ -0,0 +1,78 @@
+ Ring Buffer Library Design
+
+ Mathieu Desnoyers
+
+
+This document explains Linux Kernel Ring Buffer library.
+
+
+* Purpose of the ring buffer library
+
+Tracing: the main purpose of the ring buffer library is to perform tracing
+efficiently by providing an efficient ring buffer to transport trace data.
+
+Fast fifo queue for drivers: this library is meant to be generic enough to meet
+the requirements of audio, video and other drivers to provide an easy-to-use,
+yet efficient, buffering API.
+
+Lock-free write-side: the main advantage of this ring buffer implementation is
+that it provides non-blocking synchronization for the writer context. It
+furthermore provides a bounded write-side execution time for real-time
+applications. The per-CPU buffer configuration is wait-free. The global buffer
+configuration is lock-free. (wait-free is a stronger progress guarantee than
+lock-free.)
+
+
+* Semantic
+
+The execution context writing to the ring buffer is hereby called "producer" (or
+writer) and the thread reading the ring buffer content is called "consumer" (or
+reader). Each instance of either per-cpu or global ring buffers is called a
+"channel". A buffer is divided into subbuffers, which are synchronization points
+in the buffers (sometimes referred to as periods in the audio world). Each item
+stored in the ring buffer is called a "record". Both subbuffers and records
+may start with a "header". Records can also contain a variable-sized payload.
+
+The ring buffer supports two write modes. The "discard" mode drops data when the
+ring buffer is full. The "overwrite" (a.k.a. flight recorder) mode overwrites
+the oldest information when the ring buffer is full.
+
+Iterators are one way to consume data from the ring buffer. They allow a reader
+thread to read records one by one in the order they were written, either on a
+per-buffer or per-channel basis. Other ways to consume data are by using file
+descriptors which provide access to raw subbuffer content through, e.g.,
+splice() or mmap().
+
+
+* Programmer Interfaces
+
+The library presents a high-level interface that allows programmers to easily
+create and use a ring buffer instance. It also provides a more advanced client
+configuration API for clients with more elaborate needs (e.g. tracers).
+
+
+* Advanced client configuration options
+
+The options listed in the linux/ringbuffer/config.h header are tailored for ring
+buffer "clients" (a kernel object using the ring buffer library through its
+advanced options API) with more specific needs. The clients must set up a
+"static const" ring_buffer_config structure in which all options are spelled
+out. Given that this structure is known to be immutable, compiler optimizations
+can optimize away all the unneeded code from the library inline fast paths. The
+slow paths, however, dynamically select the correct code depending on the
+ring_buffer_config structure received as parameter. This saves space by sharing
+the slow path code between all ring buffer clients.
+
+
+* Frontend/backend layered design
+
+The ring buffer is made of two main layers: a frontend and a backend. The
+"frontend" locklessly manages space reservation within the buffer. It also
+manages timers, idle and cpu hotplug. The "backend" manages the memory backend
+used to allocate the buffers. It deals with subbuffer exchanges between the
+consumer and the producer in overwrite mode. Currently, only a page-based
+backend is implemented (RING_BUFFER_PAGE), but other backends are planned for
+the future: statically allocated backends (RING_BUFFER_STATIC) and vmap-based
+backends (RING_BUFFER_VMAP). These will allow, for instance, tracers to write
+trace data in a physically contiguous memory region allocated at boot time, or
+to write data in video card memory for crash reports.
Index: linux.trees.git/Documentation/ring-buffer/ring-buffer-usage.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/Documentation/ring-buffer/ring-buffer-usage.txt 2010-07-02 12:35:20.000000000 -0400
@@ -0,0 +1,260 @@
+ Ring Buffer Library Usage
+
+ Mathieu Desnoyers
+
+
+This document explains how to use the Linux Kernel Ring Buffer Library.
+
+The library presents a high-level interface that allows programmers to easily
+create and use a ring buffer instance. It also provides a more advanced client
+configuration API for clients with more elaborate needs (e.g. tracers).
+
+
+* Basic ring buffer configurations
+
+ The basic high-level configurations offered are pre-built clients with the
+following configuration selections under include/linux/ringbuffer/.
+
+ * The write-side (data producer) APIs are available in:
+
+ - global_overwrite.h:
+ global buffer, overwrite mode, channel-wide record iterator
+
+ - global_discard.h:
+ global buffer, discard mode, channel-wide record iterator
+
+ - percpu_overwrite.h:
+ per-cpu buffers, overwrite mode, channel-wide record iterator
+
+ - percpu_discard.h:
+ per-cpu buffers, discard mode, channel-wide record iterator
+
+ - percpu_local_overwrite.h:
+ per-cpu buffers, overwrite mode, per-cpu buffer record iterator
+
+ - percpu_local_discard.h:
+ per-cpu buffers, discard mode, per-cpu buffer record iterator
+
+ Typical use-case of the ring buffer write-side:
+
+ 1) create
+ 2) multiple calls to the write primitive.
+ 3) destroy
+
+
+ * The read-side (data consumer) iterator APIs are available in:
+
+ - iterator.h
+
+ These iterators allow to iterate on records either on a per-cpu buffer or
+ channel-wide basis.
+
+ Typical life-span of a reader using the file descriptor read() iterator:
+
+ (in user-space)
+ # cat /path_to_file/filename
+
+ Typical life-span of a reader using the in-kernel API:
+
+ 1) iterator_open()
+ 2) get_next_record and read_current_record until get_next_record returns
+ -ENODATA. -EAGAIN means there is currently no data, but there might be
+ more data coming in the future.
+ 3) iterator_close()
+
+
+* Advanced client configurations
+
+ * Advanced client configuration options
+
+ More options are available for clients with more advanced needs. These options
+are listed in the linux/ringbuffer/config.h header. A ring buffer "client" (a
+kernel object using the ring buffer library through its advanced options API)
+must set up a "static const" ring_buffer_config structure in which all options
+are spelled out.
+
+The pre-built basic configurations presented in the above set these advanced
+configuration options to values typically correct for driver use.
+
+A client using the advanced configuration options must first include
+linux/ringbuffer/config.h, declare its configuration structure, declare the
+required static inline functions used by the fast-paths, and then include
+linux/ringbuffer/api.h.
+
+The struct ring_buffer_config options are:
+
+ * alloc: RING_BUFFER_ALLOC_PER_CPU / RING_BUFFER_ALLOC_GLOBAL
+
+ Selects either global buffer or per-cpu ring buffers.
+
+ * sync: RING_BUFFER_SYNC_PER_CPU / RING_BUFFER_SYNC_GLOBAL
+
+ Selects which synchronization primitives must be used. Either expect
+ concurrency from other processors, or expect to only have concurrency with
+ the local processor. Separated from the "alloc" option because per-thread
+ buffers would fit in the "global alloc, per-cpu sync". Similarly, per-cpu
+ buffers written to with preemption enabled would fit in the "per-cpu
+ alloc, global sync" category, because migration could lead to a concurrent
+ write into a remote cpu buffer.
+
+ * mode: RING_BUFFER_OVERWRITE / RING_BUFFER_DISCARD
+
+ Either overwrite oldest subbuffers when buffer is full, or discard events.
+
+ * align: RING_BUFFER_NATURAL / RING_BUFFER_PACKED
+
+ Natural alignment aligns record headers on their natural alignment on the
+ architecture. It also aligns record payload on their natural alignment
+ (similarly to a C structure). The packed option does not perform any
+ alignment for record header and payloads. It corresponds to the "packed" gcc
+ type attribute.
+
+ * output:
+
+ RING_BUFFER_SPLICE: Output raw subbuffers through per-buffer file
+ descriptors with splice(). The read-side
+ synchronization needed to select the current
+ subbuffer is performed with ioctl().
+
+ RING_BUFFER_MMAP: Output raw subbuffers through per-buffer memory
+ mapped file descriptors. Read-side synchronization
+ to select the current subbuffer is performed with
+ ioctl().
+
+ RING_BUFFER_READ: Output raw subbuffers through per-buffer file
+ descriptors with read(). The read-side
+ synchronization needed to select the current
+ subbuffer is performed with ioctl().
+ (unimplemented)
+
+ RING_BUFFER_ITERATOR: Iterators allow a reader thread to read records one
+ by one in the order they were written, either on a
+ per-buffer or per-channel basis.
+
+ RING_BUFFER_NONE: No output provided by the library is used.
+
+ * backend:
+
+ RING_BUFFER_PAGE: The memory backend used to hold the ring buffers is
+ made of non-contiguous pages. A software-controlled
+ "subbuffer table" indexes the pages. It allows
+ sub-buffer exchange between the producer and
+ consumer in overwrite mode.
+
+ RING_BUFFER_VMAP: A vmap'd virtually contiguous memory area is used as
+ memory backend. (unimplemented)
+
+ RING_BUFFER_STATIC: A physically contiguous memory area is used as
+ memory backend. e.g. memory allocated at early boot,
+ or video card memory. (unimplemented)
+
+ * oops:
+ Select "oops" consistency if you plan to read from the ring buffer
+ after a kernel oops occurred. This is useful if you plan to use the
+ ring buffer data in a crash report. Adds a slight performance overhead
+ to keep track of how much contiguous data has been written in the
+ current subbuffer.
+
+ * ipi:
+ The IPI_BARRIER scheme issues IPIs when the consumer needs to grab a
+ sub-buffer. It issues the appropriate memory barriers on the writer
+ CPU(s). It is therefore possible to turn the memory barrier in the
+ commit fast-path into a simple compiler barrier, thus improving
+ performances. This scheme is recommended when both per-cpu allocation
+ and synchronization are used. This scheme is not recommended for
+ "global" buffers, because it would involve sending IPIs to all
+ processors.
+
+ * wakeup:
+ The option "RING_BUFFER_WAKEUP_BY_TIMER" reduces intrusiveness in
+ the writer code and guarantees wait-free/lock-free write primitives
+ by performing lazy reader wakeups in a periodic deferrable timer and
+ hooking into cpu idle notifiers. This option makes tracer code more
+ robust at the expense of additional data delivery delay.
+ Use in combination with "read_timer_interval" channel_create()
+ argument.
+ - Note: CPU idle notifiers are not implemented for all
+ architectures at the moment. The deferrable timer delays can
+ only expected to be met by architectures with idle notifiers.
+ RING_BUFFER_WAKEUP_BY_WRITER option specifies that the ring buffer
+ write-side must perform reader wakeups at each sub-buffer boundary.
+ RING_BUFFER_WAKEUP_NONE does not perform any wakeup whatsoever. The
+ client has the responsibility to perform wakeups.
+
+ * tsc_bits:
+ Timestamp compression scheme setting. 0 means that no timestamps
+ are used; 64 means that full 64-bit timestamps are written with
+ each record. For any value between 1 and 63, the ring buffer
+ library will set the RING_BUFFER_RFLAG_FULL_TSC bit in the
+ "rflags" ring_buffer_ctx field, which is also passed as parameter
+ passed to the "record_header_size()" callback to inform the client
+ that a full 64-bit timestamp is needed due to a "tsc_bits"
+ overflow since the last record.
+
+Some options are passed as parameter to channel_create():
+
+ * subbuf_size:
+ Size of a sub-buffer within a ring buffer. Extra synchronization is
+ performed when the data producer crosses sub-buffer boundaries. This
+ corresponds to "periods" in audio buffers. The maximum record size is
+ limited by the sub-buffer size. The minimum sub-buffer size is 1 page.
+
+ * num_subbuf:
+ Number of sub-buffers per buffer. Typically, using at least 2
+ sub-buffers is recommended to minimize record discards.
+
+ * switch_timer_interval:
+ The switch timer interval configures the periodical deferrable
+ timer which handles periodical buffer switch. It is used to make
+ data readily available for consumption periodically for live data
+ streaming. A buffer switch is a synchronization point between the data
+ producers and consumer.
+
+ * read_timer_interval:
+ The read timer interval is the time interval (in us) to wake up pending
+ readers.
+
+* Advanced client callbacks
+
+ These callbacks are configured by the cb field of the ring_buffer_config
+structure. They are provided to the ring buffer by the client. For both
+ring_buffer_clock_read() and record_header_size(), inline versions must also be
+provided before inclusion of linux/ringbuffer/api.h.
+
+ * ring_buffer_clock_read():
+ Returns the current ring buffer clock source time (64-bit value).
+
+ * record_header_size():
+ Returns the size of the current record size, including record header
+ size. It uses the "rflags" parameter to determine if a full 64-bit
+ timestamp is required or if "tsc_bits" bits are enough to represent the
+ current time and detect "tsc_bits"-bit overflow. The offset received as
+ parameter is relative to a page boundary, which allows alignment
+ calculation. data_size is the size of the event payload.
+ "pre_header_padding" can be set by record_header_size() to the amount of
+ padding required to align the record header (considered to be 0 if
+ unset).
+
+ * subbuffer_header_size():
+ Returns the size of the subbuffer header.
+
+ * buffer_begin():
+ Callback executed when crossing a sub-buffer boundary, when starting to
+ write into the sub-buffer.
+
+ * buffer_end():
+ Callback executed when crossing a sub-buffer boundary, before delivering
+ a sub-buffer. Has exclusive sub-buffer access when called; meaning that
+ no concurrent commits are left, no reader can access the sub-buffer, no
+ concurrent writers are allowed to overwrite the sub-buffer.
+
+ * buffer_create():
+ This callback is executed upon creation of a buffer, either at channel
+ creation, or at CPU hotplug.
+
+ * buffer_finalize():
+ Callback executed upon channel finalize, performed by channel_destroy().
+
+ * record_get():
+ Reader helper provided by the client, which can be used to extract the
+ record header from a record in the buffer.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/