MTD bad Block issue

本文主要是介绍MTD bad Block issue，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

轉載自http://www.stlinux.com/howto/NAND/BadBlocks

Sometimes, an attempt to erase a bad block may appear to work. However, this does not mean that the block is usable. Even if subsequent write operations may appear to work, the reliability of the data cannot be guaranteed. In addition, attempting to erase a bad block risks erasing the MBBM and in some circumstances it may be impossible to recover this information, if further erase operations appear to work. This is obviously a dangerous state of affairs and should be avoided at all costs.

全文如下

Bad Blocks

It is in the nature of NAND Flash that a small proportion of the blocks in the device are defective and therefore unusable from the day of manufacture (typically up to 1% is deemed acceptable by the manufacturer). Manufacturers perform thorough testing to identify any potentially bad blocks. When they have been identified, bad blocks are marked with a special marker in the OOB area of the block. This is the Manufacturer's Bad Block Marker (MBBM).

In addition, blocks become "worn" with use and stop being usable after a certain number of write and erase cycles. This condition presents itself by an error flag being set in the NAND device following an Erase operation. The software that manages the NAND device must implement a "wear-leveling" algorithm to ensure that no blocks suffer from excessive use in comparison to the others. If wear leveling is not implemented, this will dramatically reduce the lifetime of the device. The NAND-aware filesystems supported by STMicroelectronics all implement some form of wear-levelling strategy.

Sometimes, an attempt to erase a bad block may appear to work. However, this does not mean that the block is usable. Even if subsequent write operations may appear to work, the reliability of the data cannot be guaranteed. In addition, attempting to erase a bad block risks erasing theMBBM and in some circumstances it may be impossible to recover this information, if further erase operations appear to work. This is obviously a dangerous state of affairs and should be avoided at all costs.

The important point to note is that once a block has been identified as bad, either by the manufacturer or later becasue of an erase failure, that block must be excluded from further use.

Bad Block Management

To cope with the presence of bad blocks, the software must employ some form of bad block management. Typically, it uses a Bad Block Table (BBT) to record all known bad blocks that are present on a device. Before reading from or writing to the NAND device, the software consults theBBT to determine the locations that are safe to use. It must also monitor the status of Erase operations, with all failures being recorded in the BBT.

There are two types of BBT: NAND resident (that is, permanently stored in the NAND device itself) and RAM resident (stored only in volatile SDRAM and therefore regenerated on each boot).

RAM resident BBT

RAM-resident BBTs are volatile and must be recreated every time the system is booted. The process involves scanning each block in the NAND device to check for bad block markers.

The main advantage of this approach is simplicity. This is particularly true for manufacturability, where is is possible for a generic NAND programmer to program pre-prepared images without the need to understand the underlying ECC scheme or any BBT formats.

There are, however, a number of disadavantages. In some cases these disadvantages preclude the use of RAM-resident BBTs.

Performance: Typically, scanning the device for bad block markers takes considerably more time than retrieving the BBT from the NAND device.
Marking worn blocks: Blocks that go bad through use must be marked in such a way that they can be detected on subsequent scans. Typically, this involves writing a marker in the OOB area, similar to the manufacturer's marker. However, since the block has gone bad, there is no guarantee that writing a bad block marker will succeed. The block will then fail to be detected as bad on subsequent scans which may lead to data corruption if used later.
ECC layout clashes with the MBBM location: Certain ECC layouts store the ECC data in the same location in the OOB area as that used by the MBBM. This means that after a page has been programmed, the ECC data may be incorrectly interpreted as an MBBM leading to false-positive bad blocks. In some cases, this issue can be avoided by adding tags to the ECC data. In other cases, there is no viable solution, other than to use NAND-resident BBTs.

NAND resident BBT

The use of NAND-Resident BBTs overcomes many of the issues associated with RAM-resident BBTs. For most cases this is the recommended method for recording and tracking bad blocks.

As a NAND-resident BBT is non-volatile, it is preserved across system boots. There should never be any reason to recreate the BBT by scanning the NAND device for bad block markers.

Typically, the BBT requires two bits of storage for each block. The table is stored in the last good block with a backup in the penultimate good block. By default, the last four physical blocks are reserved for BBTs. If there are fewer than two good blocks available in the last four, then the NAND device should be discarded.

In a ideal situation, the BBT should be built and written to Flash before any other data. This is mandatory in cases where it is not possible to use the ECC tags to distinguish between valid programmed ECC data and an MBBM. However, this has implications for manufacturability, as the NAND programmer needs to be taught how to write the BBT, including the relevant ECC scheme.

In some cases, it may be appropriate for the NAND Programmer to skip writing BBTs, and to defer BBT creation to the software drivers when the system is first booted. This avoids the complexities of customising the NAND Programmer, whilst retaining the benefits of using NAND-residentBBTs. This approach is only viable if there is no clash between the ECC layout and the MBBM location, or where ECC tags can be used to avoid ECC data being misinterpreted as a MBBM.

===================實際案例===============================

If there are fewer than two good blocks available in the last four, then the NAND device should be discarded.

Bad eraseblock 2044 at 0x0000ff800000
Bad eraseblock 2045 at 0x0000ffa00000
Bad eraseblock 2046 at 0x0000ffc00000
Bad eraseblock 2047 at 0x0000ffe00000
No space left to write bad block table

4G flash的最後4block都壞掉了,所以這顆nand flash開不了機

使用我自己寫的flash read and write 程式

nand flash是256MB,切成兩顆mtd0:128MB, mtd1:128MB

nand_erase_nand: attempt to erase a bad block at page 0x0001ff00

nand_erase_nand: attempt to erase a bad block at page 0x0001ff40

nand_erase_nand: attempt to erase a bad block at page 0x0001ff80

nand_erase_nand: attempt to erase a bad block at page 0x0001ffc0

對照用ioctl(fd, MEMGETBADBLOCK, &offset）抓出錯誤的block起始位置

Skipping bad block at 0x07f80000

Skipping bad block at 0x07fa0000

Skipping bad block at 0x07fc0000

Skipping bad block at 0x07fe0000

NOTE: 1block=64pages

此顆nand flash, 1 block=128K,所以1page=2K

因為我是以mtd1作實驗,所以page的數量要先減掉mtd0的page數

mtd0 page數為 128MB=0x8000000, 1page=2K=0x800.所以mtd0 page數為0x10000

故

0x0001ff00->0xff00, 0xff00*0x800(2K)=0x7f80000

0x0001ff40->0xff40, 0xff40*0x800=0x7fa0000

0x0001ff80->0xff80, 0xff80*0x800=0x7fc0000

0x0001ffc0->0xffc0, 0xffc0*0x800=0x7fe0000

这篇关于MTD bad Block issue的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

MTD bad Block issue

Bad Blocks

Bad Block Management

RAM resident BBT

NAND resident BBT

相关文章

SpringBoot排查和解决JSON解析错误(400 Bad Request)的方法

[Linux Kernel Block Layer第一篇] block layer架构设计

block对变量捕获的方式

Linux block_device gendisk和hd_struct到底是个啥关系

【python requests错误】Caused by SSLError(SSLError(bad handshake: SysCallError(104, 'ECONNRESET')

Oracle - ORA-01789: Query block has incorrect number of result columns

nacos Spring cloud 报错 URI is not absolute、service not found、502 bad gateway

ARC下的block导致的循环引用问题解析

前端面试：对BFC规范(块级格式化上下文：block formatting context)的理解

git or vscode-电脑电源断或者蓝屏-重启运行项目git报错-git : bad signnature 300000