patch-2.3.47 linux/Documentation/DMA-mapping.txt

Next file: linux/Documentation/LVM-HOWTO
Previous file: linux/Documentation/Configure.help
Back to the patch index
Back to the overall index

diff -u --recursive --new-file v2.3.46/linux/Documentation/DMA-mapping.txt linux/Documentation/DMA-mapping.txt
@@ -28,19 +28,126 @@
 
 #include <linux/pci.h>
 
-is in your driver. This file defines a dma_addr_t type which should be
-used everywhere you hold a DMA (bus) address returned from the DMA mapping
-functions.
+is in your driver. This file will obtain for you the definition of
+the dma_addr_t type which should be used everywhere you hold a DMA
+(bus) address returned from the DMA mapping functions.
+
+			DMA addressing limitations
+
+Does your device have any DMA addressing limitations?  For example, is
+your device only capable of driving the low order 24-bits of address
+on the PCI bus for DMA transfers?  If your device can handle any PCI
+dma address fully, then please skip to the next section, the rest of
+this section does not concern your device.
+
+For correct operation, you must interrogate the PCI layer in your
+device probe routine to see if the PCI controller on the machine can
+properly support the DMA addressing limitation your device has.  This
+query is performed via a call to pci_dma_supported():
+
+	int pci_dma_supported(struct pci_dev *pdev, dma_addr_t device_mask)
+
+Here, pdev is a pointer to the PCI device struct of your device, and
+device_mask is a bit mask describing which bits of a PCI address your
+device supports.  It returns non-zero if your card can perform DMA
+properly on the machine.  If it returns zero, your device can not
+perform DMA properly on this platform, and attempting to do so will
+result in undefined behavior.
+
+In the failure case, you have two options:
+
+1) Use some non-DMA mode for data transfer, if possible.
+2) Ignore this device and do not initialize it.
+
+It is recommended that your driver print a kernel KERN_WARN message
+when you do one of these two things.  In this manner, if a user of
+your driver reports that performance is bad or that the device is not
+even detected, you can ask him for the kernel messages to find out
+exactly why.
+
+So if, for example, you device can only drive the low 24-bits of
+address during PCI bus mastering you might do something like:
+
+	if (! pci_dma_supported(pdev, 0x00ffffff))
+		goto ignore_this_device;
+
+There is a case which we are aware of at this time, which is worth
+mentioning in this documentation.  If your device supports multiple
+functions (for example a sound card provides playback and record
+functions) and the various different functions have _different_
+DMA addressing limitations, you may wish to probe each mask and
+only provide the functionality which the machine can handle.
+Here is pseudo-code showing how this might be done:
+
+	#define PLAYBACK_ADDRESS_BITS	0xffffffff
+	#define RECORD_ADDRESS_BITS	0x00ffffff
+
+	struct my_sound_card *card;
+	struct pci_dev *pdev;
+
+	...
+	if (pci_dma_supported(pdev, PLAYBACK_ADDRESS_BITS)) {
+		card->playback_enabled = 1;
+	} else {
+		card->playback_enabled = 0;
+		printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n",
+		       card->name);
+	}
+	if (pci_dma_supported(pdev, RECORD_ADDRESS_BITS)) {
+		card->record_enabled = 1;
+	} else {
+		card->record_enabled = 0;
+		printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n",
+		       card->name);
+	}
+
+A sound card was used as an example here because this genre of PCI
+devices seems to be littered with ISA chips given a PCI front end,
+and thus retaining the 16MB DMA addressing limitations of ISA.
+
+			Types of DMA mappings
 
 There are two types of DMA mappings:
-- static DMA mappings which are usually mapped at driver initialization,
-  unmapped at the end and for which the hardware should not assume
-  sequential accesses (from both the DMA engine in the card and CPU).
-- streaming DMA mappings which are usually mapped for one DMA transfer,
+
+- Consistent DMA mappings which are usually mapped at driver
+  initialization, unmapped at the end and for which the hardware should
+  guarentee that the device and the cpu can access the data
+  in parallel and will see updates made by each other without any
+  explicit software flushing.
+
+  Think of "consistent" as "synchronous" or "coherent".
+
+  Good examples of what to use consistent mappings for are:
+
+	- Network card DMA ring descriptors.
+	- SCSI adapter mailbox command data structures.
+	- Device firmware microcode executed out of
+	  main memory.
+
+  The invariant these examples all require is that any cpu store
+  to memory is immediately visible to the device, and vice
+  versa.  Consistent mappings guarentee this.
+
+- Streaming DMA mappings which are usually mapped for one DMA transfer,
   unmapped right after it (unless you use pci_dma_sync below) and for which
   hardware can optimize for sequential accesses.
 
-To allocate and map a static DMA region, you should do:
+  This of "streaming" as "asynchronous" or "outside the coherency
+  domain".
+
+  Good examples of what to use streaming mappings for are:
+
+	- Networking buffers transmitted/received by a device.
+	- Filesystem buffers written/read by a SCSI device.
+
+  The interfaces for using this type of mapping were designed in
+  such a way that an implementation can make whatever performance
+  optimizations the hardware allows.  To this end, when using
+  such mappings you must be explicit about what you want to happen.
+
+		 Using Consistent DMA mappings.
+
+To allocate and map a consistent DMA region, you should do:
 
 	dma_addr_t dma_handle;
 
@@ -48,13 +155,18 @@
 
 where dev is a struct pci_dev *. You should pass NULL for PCI like buses
 where devices don't have struct pci_dev (like ISA, EISA).
+
 This argument is needed because the DMA translations may be bus
-specific (and often is private to the bus which the device is attached to).
+specific (and often is private to the bus which the device is attached
+to).
+
 Size is the length of the region you want to allocate.
+
 This routine will allocate RAM for that region, so it acts similarly to
-__get_free_pages (but takes size instead of page order).
-It returns two values: the virtual address which you can use to access it
-from the CPU and dma_handle which you pass to the card.
+__get_free_pages (but takes size instead of a page order).
+
+It returns two values: the virtual address which you can use to access
+it from the CPU and dma_handle which you pass to the card.
 
 The cpu return address and the DMA bus master address are both
 guaranteed to be aligned to the smallest PAGE_SIZE order which
@@ -68,7 +180,66 @@
 	pci_free_consistent(dev, size, cpu_addr, dma_handle);
 
 where dev, size are the same as in the above call and cpu_addr and
-dma_handle are the values pci_alloc_consistent returned.
+dma_handle are the values pci_alloc_consistent returned to you.
+
+			DMA Direction
+
+The interfaces described in subsequent portions of this document
+take a DMA direction argument, which is an integer and takes on
+one of the following values:
+
+ PCI_DMA_BIDIRECTIONAL
+ PCI_DMA_TODEVICE
+ PCI_DMA_FROMDEVICE
+ PCI_DMA_NONE
+
+One should provide the exact DMA direction if you know it.
+
+PCI_DMA_TODEVICE means "from main memory to the PCI device"
+PCI_DMA_FROMDEVICE means "from the PCI device to main memory"
+
+Cou are _strongly_ encouraged to specify this as precisely
+as you possibly can.
+
+If you absolutely cannot know the direction of the DMA transfer,
+specify PCI_DMA_BIDIRECTIONAL.  It means that the DMA can go in
+either direction.  The platform guarentees that you may legally
+specify this, and that it will work, but this may be at the
+cost of performance for example.
+
+The value PCI_DMA_NONE is to be used for debugging.  One can
+hold this in a data structure before you come to know the
+precise direction, and this will help catch cases where your
+direction tracking logic has failed to set things up properly.
+
+Another advantage of specifying this value precisely (outside
+of potential platform-specific optimizations of such) is for
+debugging.  Some platforms actually have a write permission
+boolean which DMA mappings can be marked with, much like page
+protections in a user program can have.  Such platforms can
+and do report errors in the kernel logs when the PCI controller
+hardware detects violation of the permission setting.
+
+Only streaming mappings specify a direction, consistent mappings
+implicitly have a direction attribute setting of
+PCI_DMA_BIDIRECTIONAL.
+
+The SCSI subsystem provides mechanisms for you to easily obtain
+the direction to use, in the SCSI command:
+
+	scsi_to_pci_dma_dir(SCSI_DIRECTION)
+
+Where SCSI_DIRECTION is obtained from the 'sc_data_direction'
+member of the SCSI command your driver is working on.  The
+mentioned interface above returns a value suitable for passing
+into the streaming DMA mapping interfaces below.
+
+For Networking drivers, it's a rather simple affair.  For transmit
+packets, map/unmap them with the PCI_DMA_TODEVICE direction
+specifier.  For receive packets, just the opposite, map/unmap them
+with the PCI_DMA_FROMDEVICE direction specifier.
+
+		  Using Streaming DMA mappings
 
 The streaming DMA mapping routines can be called from interrupt context.
 There are two versions of each map/unmap, one which map/unmap a single
@@ -78,18 +249,18 @@
 
 	dma_addr_t dma_handle;
 
-	dma_handle = pci_map_single(dev, addr, size);
+	dma_handle = pci_map_single(dev, addr, size, direction);
 
 and to unmap it:
 
-	pci_unmap_single(dev, dma_handle, size);
+	pci_unmap_single(dev, dma_handle, size, direction);
 
 You should call pci_unmap_single when the DMA activity is finished, e.g.
 from interrupt which told you the DMA transfer is done.
 
 Similarly with scatterlists, you map a region gathered from several regions by:
 
-	int i, count = pci_map_sg(dev, sglist, nents);
+	int i, count = pci_map_sg(dev, sglist, nents, direction);
 	struct scatterlist *sg;
 
 	for (i = 0, sg = sglist; i < count; i++, sg++) {
@@ -98,13 +269,15 @@
 	}
 
 where nents is the number of entries in the sglist.
+
 The implementation is free to merge several consecutive sglist entries
 into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
 consecutive sglist entries can be merged into one provided the first one
 ends and the second one starts on a page boundary - in fact this is a huge
 advantage for cards which either cannot do scatter-gather or have very
 limited number of scatter-gather entries) and returns the actual number
-of sg entries it mapped them too.
+of sg entries it mapped them to.
+
 Then you should loop count times (note: this can be less than nents times)
 and use sg_dma_address() and sg_dma_length() macros where you previously
 accessed sg->address and sg->length as shown above.
@@ -114,6 +287,12 @@
 	pci_unmap_sg(dev, sglist, nents);
 
 Again, make sure DMA activity finished.
+
+PLEASE NOTE:  The 'nents' argument to the pci_unmap_sg call must be
+              the _same_ one you passed into the pci_map_sg call,
+	      it should _NOT_ be the 'count' value _returned_ from the
+              pci_map_sg call.
+
 Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}
 counterpart, because the bus address space is a shared resource (although
 in some ports the mapping is per each BUS so less devices contend for the
@@ -124,15 +303,64 @@
 the data in between the DMA transfers, just map it
 with pci_map_{single,sg}, after each DMA transfer call either:
 
-	pci_dma_sync_single(dev, dma_handle, size);
+	pci_dma_sync_single(dev, dma_handle, size, direction);
 
 or:
 
-	pci_dma_sync_sg(dev, sglist, nents);
+	pci_dma_sync_sg(dev, sglist, nents, direction);
 
 and after the last DMA transfer call one of the DMA unmap routines
 pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*
-call till pci_unmap_*, then you don't have to call pci_sync_* routines.
+call till pci_unmap_*, then you don't have to call the pci_sync_*
+routines at all.
+
+Here is pseudo code which shows a situation in which you would need
+to use the pci_dma_sync_*() interfaces.
+
+	my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
+	{
+		dma_addr_t mapping;
+
+		mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE);
+
+		cp->rx_buf = buffer;
+		cp->rx_len = len;
+		cp->rx_dma = mapping;
+
+		give_rx_buf_to_card(cp);
+	}
+
+	...
+
+	my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs)
+	{
+		struct my_card *cp = devid;
+
+		...
+		if (read_card_status(cp) == RX_BUF_TRANSFERRED) {
+			struct my_card_header *hp;
+
+			/* Examine the header to see if we wish
+			 * to except the data.  But synchronize
+			 * the DMA transfer with the CPU first
+			 * so that we see updated contents.
+			 */
+			pci_dma_sync_single(cp->pdev, cp->rx_buf, cp->rx_len,
+					    PCI_DMA_FROMDEVICE);
+
+			/* Now it is safe to examine the buffer. */
+			hp = (struct my_card_header *) cp->rx_buf;
+			if (header_is_ok(hp)) {
+				pci_unmap_single(cp->pdev, cp->rx_buf, cp->rx_len,
+						 PCI_DMA_FROMDEVICE);
+				pass_to_upper_layers(cp->rx_buf);
+				make_and_setup_new_rx_buf(cp);
+			} else {
+				/* Just give the buffer back to the card. */
+				give_rx_buf_to_card(cp);
+			}
+		}
+	}
 
 Drivers converted fully to this interface should not use virt_to_bus any
 longer, nor should they use bus_to_virt. Some drivers have to be changed a
@@ -142,8 +370,14 @@
 stores them in the scatterlist itself if the platform supports dynamic DMA
 mapping in hardware) in your driver structures and/or in the card registers.
 
-For PCI cards which recognize fewer address lines than 32 in Single
-Address Cycle, you should set corresponding pci_dev's dma_mask field to a
-different mask. The dma mapping routines then should either honour your request
-and allocate the DMA only with the bus address with bits set in your
-dma_mask or should complain that the device is not supported on that platform.
+This document, and the API itself, would not be in it's current
+form without the feedback and suggestions from numerous individuals.
+We would like to specifically mention, in no particular order, the
+following people:
+
+	Russell King <rmk@arm.linux.org.uk>
+	Leo Dagum <dagum@barrel.engr.sgi.com>
+	Ralf Baechle <ralf@oss.sgi.com>
+	Grant Grundler <grundler@cup.hp.com>
+	Jay Estabrook <Jay.Estabrook@compaq.com>
+	Thomas Sailer <sailer@ife.ee.ethz.ch>

FUNET's LINUX-ADM group, linux-adm@nic.funet.fi
TCL-scripts by Sam Shen (who was at: slshen@lbl.gov)