git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log

f2fs: give a chance to merge IOs by IO scheduler

Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.

...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...

However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.

...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...

Note that this issue should be addressed in checkpoint, and some readahead
flows too.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid frequent background GC

If there is no victim segments selected by background GC, let's wait
a little bit longer time to collect dirty segments.
By default, let's give 5 minutes.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints to debug checkpoint request

Add tracepoints to debug checkpoint request.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: change expressions]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for write page operations

Add tracepoints to debug the various page write operation
like data pages, meta pages.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: remove unnecessary tracepoints]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints to debug the block allocation

Add tracepoints to debug the block allocation & fallocate.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: enhance information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for GC threads

Add tracepoints for tracing the garbage collector
threads in f2fs with status of collection & type.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: modify slightly to show information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoint for tracing the page i/o

Add tracepoints for page i/o operations and block allocation
tracing during page read operation.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for truncate operation

add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.

Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for sync & inode operations

Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.

Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: make is_multimedia_file code align with its name

The code conditions put inside the function is_multimedia_file are
reverse to the name i.e, we need to negate the return to actually
check if the file is a multimedia file. So, change the code and usage
path to align both the name and comparision conditions.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix error return code in f2fs_fill_super()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Introduce by commit c0d39e(f2fs: fix return values from validate superblock)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix typo mistakes

Fix typo mistakes.
1. I think that it should be 'L' instead of 'V'.
2. and try to fix 'Front' instead of 'Frone'

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: write checkpoint before starting FG_GC

In order to be aware of prefree and free sections during FG_GC, let's start with
write_checkpoint().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix the logic of IS_DNODE()

If (ofs % (NIDS_PER_BLOCK + 1) == 0), the node is an indirect node block.

Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: introduce a new global lock scheme

In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.

3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.

5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()

Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: move f2fs_balance_fs from truncate to punch_hole

Move the f2fs_balance_fs out of the truncate_hole function and only
perform that in punch_hole use case.  The commit:

  ed60b1644e7f7e5dd67d21caf7e4425dff05dad0

intended to do this but moved it into truncate_hole to cover more
cases.  However, a deadlock scenario is possible when deleting an inode
entry under specific conditions:

f2fs_delete_entry()
     mutex_lock_op(sbi, DENTRY_OPS);
     truncate_hole()
         f2fs_balance_fs()
             mutex_lock(&sbi->gc_mutex);
             f2fs_gc()
                 write_checkpoint()
                     block_operations()
                         mutex_lock_op(sbi, DENTRY_OPS);

Lets move it into the punch_hole case to cover the original intent of
avoiding it during fallocate's expand_inode_data case.

Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: reduce redundant spin_lock operations

This patch reduces redundant spin_lock operations in alloc_nid_failed().
The alloc_nid_failed() does not need to delete entry and add one again
by triggering spin_lock and spin_unlock redundantly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: update f2fs.txt related with discard at mkfs

o mkfs.f2fs supports no discard option.
o fixed volume label size in 512 bytes.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add NULL pointer check

Commit - fa9150a84c - replaces a call to generic_writepages() in
f2fs_write_data_pages() with write_cache_pages(), with a function pointer
argument pointing to routine: __f2fs_writepage.

  -> https://git.kernel.org/linus/fa9150a84ca333f68127097c4fa1eda4b3913a22

  This patch adds a NULL pointer check in f2fs_write_data_pages() to avoid
  a possible NULL pointer dereference, in case if - mapping->a_ops->writepage -
  is NULL.

Signed-off-by: P J P <ppandit@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix the bitmap consistency of dirty segments

Like below, there are 8 segment bitmaps for SSR victim candidates.

enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};

The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.

For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,

which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.

However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.

So, this patch eliminates this inconsistency.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid race for summary information

In order to do GC more reliably, I'd like to lock the vicitm summary page
until its GC is completed, and also prevent any checkpoint process.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: allocate remained free segments in the LFS mode

This patch adds a new condition that allocates free segments in the current
active section even if SSR is needed.
Otherwise, f2fs cannot allocate remained free segments in the section since
SSR finds dirty segments only.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: check completion of foreground GC

The foreground GCs are triggered under not enough free sections.
So, we should not skip moving valid blocks in the victim segments.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: change GC bitmaps to apply the section granularity

This patch removes a bitmap for victim segments selected by foreground GC, and
modifies the other bitmap for victim segments selected by background GC.

1) foreground GC bitmap
: We don't need to manage this, since we just only one previous victim section
   number instead of the whole victim history.
   The f2fs uses the victim section number in order not to allocate currently
   GC'ed section to current active logs.

2) background GC bitmap
: This bitmap is used to avoid selecting victims repeatedly by background GCs.
   In addition, the victims are able to be selected by foreground GCs, since
   there is no need to read victim blocks during foreground GCs.

   By the fact that the foreground GC reclaims segments in a section unit, it'd
   be better to manage this bitmap based on the section granularity.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: allocate new segment aligned with sections

When allocating a new segment under the LFS mode, we should keep the section
boundary.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: remove redundant lock_page calls

In get_node_page, we do not need to call lock_page all the time.

If the node page is cached as uptodate,

1. grab_cache_page locks the page,
2. read_node_page unlocks the page, and
3. lock_page is called for further process.

Let's avoid this.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: introduce TOTAL_SECS macro

Let's use a macro to get the total number of sections.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: do not use duplicate names in a macro

A macro should not use duplicate parameter names.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: use kmemdup

Use kmemdup instead of kzalloc and memcpy.

Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix to give correct parent inode number for roll forward

When we recover fsync'ed data after power-off-recovery, we should guarantee
that any parent inode number should be correct for each direct inode blocks.

So, let's make the following rules.

- The fsync should do checkpoint to all the inodes that were experienced hard
links.

- So, the only normal files can be recovered by roll-forward.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: remain nat cache entries for further free nid allocation

In the checkpoint flow, the f2fs investigates the total nat cache entries.
Previously, if an entry has NULL_ADDR, f2fs drops the entry and adds the
obsolete nid to the free nid list.
However, this free nid will be reused sooner, resulting in its nat entry miss.
In order to avoid this, we don't need to drop the nat cache entry at this moment.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: do not skip writing file meta during fsync

This patch removes data_version check flow during the fsync call.
The original purpose for the use of data_version was to avoid writng inode
pages redundantly by the fsync calls repeatedly.
However, when user can modify file meta and then call fsync, we should not
skip fsync procedure.
So, let's remove this condition check and hope that user triggers in right
manner.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix the recovery flow to handle errors correctly

We should handle errors during the recovery flow correctly.
For example, if we get -ENOMEM, we should report a mount failure instead of
conducting the remained mount procedure.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix typo in comments

Correct spelling typo in comments

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid BUG_ON from check_nid_range and update return path in do_read_inode

In function check_nid_range, there is no need to trigger BUG_ON and make kernel stop.
Instead it could just check and indicate the inode number to be EINVAL.
Update the return path in do_read_inode to use the return from check_nid_range.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
[Jaegeuk: replace BUG_ON with WARN_ON]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix return values from validate superblock

validate super block is not returning with proper values.
When failure from sb_bread it should reflect there is an EIO otherwise
it should return of EINVAL.
Returning, '1' is not conveying proper message as the return type.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: reorganize f2fs_setxattr

make use of F2FS_NAME_LEN for name length checking,
change return conditions at few places, by assigning
storing the errorvalue in 'error' and making a common
exit path.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: notify when discard is not supported

Change f2fs so that a warning is emitted when an attempt is made to
mount a filesystem with the unsupported discard option.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix to call WRITE_FLUSH at the end of fsync

The fsync call should be ended after flushing the in-device caches.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix not to allocate max_nid

The build_free_nid should not add free nids over nm_i->max_nid.
But, there was a hole that invalid free nid was added by the following scenario.

Let's suppose nm_i->max_nid = 150 and the last NAT page has 100 ~ 200 nids.

build_free_nids
  - get_current_nat_page loads the last NAT page
  - scan_nat_page can add 100 ~ 200 nids
    -> Bug here!
So, when scanning an NAT page, we should check each candidate whether it is
over max_nid or not.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix return value of releasepage for node and data

If the return value of releasepage is equal to zero, the page cannot be reclaimed.
Instead, we should return 1 in order to reclaim clean pages.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: scan next nat page to reuse free nids in there

When we build new free nids, let's scan the just next NAT page instead of
skipping a couple of previously scanned pages in order to reuse free nids in
there.
Otherwise, we can use too much wide range of nids even though several nids were
deallocated, and also their node pages can be cached in the node_inode's address
space.
This means that we can retain lots of clean pages in the main memory, which
induces mm's reclaiming overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: should check the node page was truncated first

Currently, f2fs doesn't reclaim any node pages.
However, if we found that a node page was truncated by checking its block
address with zero during f2fs_write_node_page, we should not skip that node
page and return zero to reclaim it.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: reduce unncessary locking pages during read

This patch reduces redundant locking and unlocking pages during read operations.
In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page.
And then, when we need to modify any data finally, let's lock the page so that
we can avoid lock contention.

[readpage rule]
- The f2fs_readpage returns unlocked page, or released page too in error cases.
- Its caller should handle read error, -EIO, after locking the page, which
indicates read completion.
- Its caller should check PageUptodate after grab_cache_page.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid extra ++ while returning from get_node_path

In all the breaking conditions in get_node_path, 'n' is used to
track index in offset[] array, but while breaking out also, in all
paths n++ is done.
So, remove the ++ from breaking paths. Also, avoid
reset of 'level=0' in first case.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: align f2fs maximum name length to linux based filesystem

The maximum filename length supported in linux is 255 characters.
So let's follow that.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: optimize and change return path in lookup_free_nid_list

Optimize and change return path in lookup_free_nid_list

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: optimize get node page readahead part

We can remove the call to find_get_page to get a page from the cache
and check for up-to-date, instead we can make use of grab_cache_page
part itself to fetch the page from the cache.
So, removing the call and moving the PageUptodate at proper place, also
taken care of moving the lock_page condition in the page_hit part.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: check the level before calling get_nid function

The caller of get_nid should be careful not to put lower value than
NODE_DIR1_BLOCK in case of level is zero.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: introduce readahead mode of node pages

Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole

However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.

So, let's clarify all the cases with the additional modes as follows.

enum {
ALLOC_NODE, /* allocate a new node page if needed */
LOOKUP_NODE, /* look up a node without readahead */
LOOKUP_NODE_RA, /*
* look up a node with readahead called
* by get_datablock_ro.
*/
}

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>

f2fs: read with READ_SYNC when getting dnode page

The get_node_page_ra tries to:
1. grab or read a target node page for the given nid,
2. then, call ra_node_page to read other adjacent node pages in advance.

So, when we try to read a target node page by #1, we should submit bio with
READ_SYNC instead of READA.
And, in #2, READA should be used.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>

f2fs: fix to unlock node page when it was truncated

If the node page was truncated, its block address became zero.
This means that we don't need to write the node page, but have to unlock
NODE_WRITE, decrease the number of dirty node pages, and then unlock_page
before returning the f2fs_write_node_page with zero.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix overflow when calculating utilization on 32-bit

Use div_u64 to fix overflow when calculating utilization.
*long int* is 4-bytes on 32-bit so (user blocks * 100) might be
overflow if disk size is over e.g. 512GB.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Several boot fixes (MacBook, legacy EFI bootloaders), another
  please-don't-brick fix, and some minor stuff."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Do not try to sync identity map for non-mapped pages
  x86, doc: Be explicit about what the x86 struct boot_params requires
  x86: Don't clear efi_info even if the sentinel hits
  x86, mm: Make sure to find a 2M free block for the first mapped area
  x86: Fix 32-bit *_cpu_data initializers
  efivarfs: return accurate error code in efivarfs_fill_super()
  efivars: efivarfs_valid_name() should handle pstore syntax
  efi: be more paranoid about available space when creating variables
  iommu, x86: Add DMA remap fault reason
  x86, smpboot: Remove unused variable

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Misc radeon, nouveau, mgag200 and intel fixes.

  The intel fixes should contain the fix for the touchpad on the
  Chromebook - hey I'm an input maintainer now!"

Hate to pee on your parade, Dave, but I don't think being an input
maintainer is necessarily something to strive for..

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (25 commits)
  drm/tegra: drop "select DRM_HDMI"
  drm: Documentation typo fixes
  drm/mgag200: Bug fix: Renesas board now selects native resolution.
  drm/mgag200: Reject modes that are too big for VRAM
  drm/mgag200: 'fbdev_list' in 'struct mga_fbdev' is not used
  drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
  drm/radeon: skip MC reset as it's probably not hung
  drm/radeon: add primary dac adj quirk for R200 board
  drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled
  drm/i915: Turn off hsync and vsync on ADPA when disabling crt
  drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits
  drm/i915: also disable south interrupts when handling them
  drm/i915: enable irqs earlier when resuming
  drm/i915: Increase the RC6p threshold.
  DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.
  drm/nv50-: prevent some races between modesetting and page flipping
  drm/nouveau/i2c: drop parent refcount when creating ports
  drm/nv84: fix regression in page flipping
  drm/nouveau: Fix typo in init_idx_addr_latched().
  drm/nouveau: Disable AGP on PowerPC again.
  ...

Merge tag 'pm+acpi-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael J Wysocki:

- Two fixes for the new intel_pstate driver from Dirk Brandewie.

- Fix for incorrect usage of the .find_bridge() callback from struct
   acpi_bus_type in the USB core and subsequent removal of that callback
   from Rafael J Wysocki.

- ACPI processor driver cleanups from Chen Gang and Syam Sidhardhan.

- ACPI initialization and error messages fix from Joe Perches.

- Operating Performance Points documentation improvement from Nishanth
   Menon.

- Fixes for memory leaks and potential concurrency issues and sysfs
  attributes leaks during device removal in the core device PM QoS code
  from Rafael J Wysocki.

- Calxeda Highbank cpufreq driver simplification from Emilio López.

- cpufreq comment cleanup from Namhyung Kim.

- Fix for a section mismatch in Calxeda Highbank interprocessor
   communication code from Mark Langsdorf (this is not a PM fix strictly
   speaking, but the code in question went in through the PM tree).

* tag 'pm+acpi-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq / intel_pstate: Do not load on VM that does not report max P state.
  cpufreq / intel_pstate: Fix intel_pstate_init() error path
  ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
  ACPI / glue: Add .match() callback to struct acpi_bus_type
  ACPI / porocessor: Beautify code, pr->id is u32 which is never < 0
  ACPI / processor: Remove redundant NULL check before kfree
  ACPI / Sleep: Avoid interleaved message on errors
  PM / QoS: Remove device PM QoS sysfs attributes at the right place
  PM / QoS: Fix concurrency issues and memory leaks in device PM QoS
  cpufreq: highbank: do not initialize array with a loop
  PM / OPP: improve introductory documentation
  cpufreq: Fix a typo in comment
  mailbox, pl320-ipc: remove __init from probe function

drm/tegra: drop "select DRM_HDMI"

Commit ac24c2204a76e5b42aa103bf963ae0eda1b827f3 ("drm/tegra: Use generic
HDMI infoframe helpers") added "select DRM_HDMI" to the DRM_TEGRA
Kconfig entry. But there is no Kconfig symbol named DRM_HDMI. The select
statement for that symbol is a nop. Drop it.

What was needed to use HDMI functionality was to select HDMI (which this
entry already did through depending on DRM) and to include linux/hdmi.h
(which this commit also did).

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: Documentation typo fixes

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: Bug fix: Renesas board now selects native resolution.

Renesas boards were consistently defaulting to the 1024x768 resolution,
regardless of the native resolution of the monitor plugged in.  It was
determined that the EDID of the monitor was not being read.  Since the
DAC is a shared line, in order to read from or write to it we must take
control of the DAC clock.  This can be done by setting the proper
register to one.

This bug fix sets the register MGA1064_GEN_IO_CTL2 to one.  The DAC
control line can be used to determine whether or not a new monitor has
been plugged in.  But since the hotplug feature is not one we will
support, it has been decided to simply leave the register set to one.

Signed-off-by: Julia Lemire <jlemire@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: Reject modes that are too big for VRAM

A monitor or a user could request a resolution greater than the
available VRAM for the backing framebuffer. This change checks the
required framebuffer size against the max VRAM size and rejects modes
if they are too big. This change can also remove a mode request passed
in via the video= parameter.

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: 'fbdev_list' in 'struct mga_fbdev' is not used

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux into drm-next

Alex writes:
  Radeon fixes pull.  Not much to it.
  - fix some splatter if the interrupt handler isn't registered
  - Add a quirk for an old R200 board to fix washed out colors on the DAC
  - Don't try and soft reset the MC when we reset the GPU.  It usually doesn't
    need it and doesn't always work reliably.
  - A CS checker fix from Marek

* 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
  drm/radeon: skip MC reset as it's probably not hung
  drm/radeon: add primary dac adj quirk for R200 board
  drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"Mainly a group of fixes, the only exception is the wiring up of the
  kcmp syscall now that those patches went in during the last merge
  window."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations
  ARM: 7667/1: perf: Fix section mismatch on armpmu_init()
  ARM: 7666/1: decompressor: add -mno-single-pic-base for building the decompressor
  ARM: 7665/1: Wire up kcmp syscall
  ARM: 7664/1: perf: remove erroneous semicolon from event initialisation
  ARM: 7663/1: perf: fix ARMv7 EVTYPE_MASK to include NSH bit
  ARM: 7662/1: hw_breakpoint: reset debug logic on secondary CPUs in s2ram resume
  ARM: 7661/1: mm: perform explicit branch predictor maintenance when required
  ARM: 7660/1: tlb: add branch predictor maintenance operations
  ARM: 7659/1: mm: make mm->context.id an atomic64_t variable
  ARM: 7658/1: mm: fix race updating mm->context.id on ASID rollover
  ARM: 7657/1: head: fix swapper and idmap population with LPAE and big-endian
  ARM: 7655/1: smp_twd: make twd_local_timer_of_register() no-op for nosmp
  ARM: 7652/1: mm: fix missing use of 'asid' to get asid value from mm->context.id
  ARM: 7642/1: netx: bump IRQ offset to 64

Merge tag 'efi-for-3.9-rc2' into x86/urgent

EFI changes for v3.9-rc2,

  * Make the EFI variable code more paranoid about running out of
    space in NVRAM, since this is the root cause of the recent issue
    where machines refuse to boot - from Matthew Garrett.

  * Some efivarfs patches that fix regressions introduced in v3.9-rc1.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: Do not try to sync identity map for non-mapped pages

kernel_map_sync_memtype() is called from a variety of contexts.  The
pat.c code that calls it seems to ensure that it is not called for
non-ram areas by checking via pat_pagerange_is_ram().  It is important
that it only be called on the actual identity map because there *IS*
no map to sync for highmem pages, or for memory holes.

The ioremap.c uses are not as careful as those from pat.c, and call
kernel_map_sync_memtype() on PCI space which is in the middle of the
kernel identity map _range_, but is not actually mapped.

This patch adds a check to kernel_map_sync_memtype() which probably
duplicates some of the checks already in pat.c.  But, it is necessary
for the ioremap.c uses and shouldn't hurt other callers.

I have reproduced this bug and this patch fixes it for me and the
original bug reporter:

https://lkml.org/lkml/2013/2/5/396

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.com
Signed-off-by: Dave Hansen <dave@sr71.net>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

Merge tag 'regulator-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A few small things here and there, nothing major here really.  The
  conversion of twl4030ldo_ops to get_voltage_sel is a fix, as covered
  in the commit log it fixes inconsistency in handling of the IS_UNSUP()
  feature in the driver."

* tag 'regulator-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fixed regulator_bulk_enable unwinding code
  regulator: twl: Convert twl4030ldo_ops to get_voltage_sel
  regulator: palmas: fix number of SMPS voltages
  regulator: core: fix documentation error in regulator_allow_bypass
  regulator: core: update kernel documentation for regulator_desc
  regulator: db8500-prcmu - remove incorrect __exit markup

Merge tag 'regmap-v3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap PM fix from Mark Brown:
"A simple fix to stop us leaking a runtime PM reference in the case
where we fail to enable a device."

* tag 'regmap-v3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: call pm_runtime_put in pm_runtime_get_sync failed case

Merge tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
"Minor code cleanups and new Kconfig option to disable /dev/ecryptfs

  The code cleanups fix up W=1 compiler warnings and some unnecessary
  checks.  The new Kconfig option, defaulting to N, allows the rarely
  used eCryptfs kernel to userspace communication channel to be compiled
  out.  This may be the first step in it being eventually removed."

Hmm.  I'm not sure whether these should be called "fixes", and it
probably should have gone in the merge window.  But I'll let it slide.

* tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: allow userspace messaging to be disabled
  eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid()
  ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check
  eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check
  eCryptfs: remove unneeded checks in virt_to_scatterlist()
  eCryptfs: Fix -Wmissing-prototypes warnings
  eCryptfs: Fix -Wunused-but-set-variable warnings
  eCryptfs: initialize payload_len in keystore.c

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon patches from Guenter Roeck:
"Bug fixes for sht15 and ltc2978 driver plus some documentation
  updates"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (sht15) Check return value of regulator_enable()
  hwmon: (adt7410) Document ADT7420 support
  hwmon: (pmbus/ltc2978) Use detected chip ID to select supported functionality
  hwmon: (pmbus/ltc2978) Fix peak attribute handling
  hwmon: (pmbus/ltc2978) Update datasheet links
  hwmon: Update my e-mail address in driver documentation

drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK

The MIP_ADDRESS state has 2 meanings. If the texture has one sample
per pixel, it's a pointer to the mipmap chain. If the texture has
multiple samples per pixel, it's a pointer to FMASK, a metadata buffer
needed for reading compressed MSAA textures. The mipmap
alignment rules do not apply to FMASK.

Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: skip MC reset as it's probably not hung

The MC is mostly likely busy (e.g., display requests), not hung
so no need to reset it. Doing an MC reset is tricky and not
particularly reliable. Fixes hangs in certain cases.

Reported-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: add primary dac adj quirk for R200 board

vbios values are wrong leading to colors that are
too bright. Use the default values instead.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled

Avoids splatter if the interrupt handler is not registered due
to acceleration being disabled.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org

ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

x86, doc: Be explicit about what the x86 struct boot_params requires

If the sentinel triggers, we do not want the boot loader authors to
just poke it and make the error go away, we want them to actually fix
the problem.

This should help avoid making the incorrect change in non-compliant
bootloaders.

[ hpa: dropped the Documentation/x86/boot.txt hunk pending
clarifications ]

Signed-off-by: Peter Jones <pjones@redhat.com>
Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: Don't clear efi_info even if the sentinel hits

When boot_params->sentinel is set, all we really know is that some
undefined set of fields in struct boot_params contain garbage. In the
particular case of efi_info, however, there is a private magic for
that substructure, so it is generally safe to leave it even if the
bootloader is broken.

kexec (for which we did the initial analysis) did not initialize this
field, but of course all the EFI bootloaders do, and most EFI
bootloaders are broken in this respect (and should be fixed.)

Reported-by: Robin Holt <holt@sgi.com>
Link: http://lkml.kernel.org/r/CA%2B5PVA51-FT14p4CRYKbicykugVb=PiaEycdQ57CK2km_OQuRQ@mail.gmail.com
Tested-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86, mm: Make sure to find a 2M free block for the first mapped area

Henrik reported that his MacAir 3.1 would not boot with

| commit 8d57470d8f859635deffe3919d7d4867b488b85a
| Date: Fri Nov 16 19:38:58 2012 -0800
|
| x86, mm: setup page table in top-down

It turns out that we do not calculate the real_end properly:
We try to get 2M size with 4K alignment, and later will round down
to 2M, so we will get less then 2M for first mapping, in extreme
case could be only 4K only. In Henrik's system it has (1M-32K) as
last usable rage is [mem 0x7f9db000-0x7fef8fff].

The problem is exposed when EFI booting have several holes and it
will force mapping to use PTE instead as we only map usable areas.

To fix it, just make it be 2M aligned, so we can be guaranteed to be
able to use large pages to map it.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Bisected-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQX4nQ7_1kg5RL_vh56rmcSHXUi1ExrZX7CwED4NGMnHfg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: Fix 32-bit *_cpu_data initializers

The commit 27be457000211a6903968dfce06d5f73f051a217
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.

If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.

To avoid such problems in future C99-style initialization is now
used.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next

A bunch of fixes, nothing truely horrible:
- Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs
  and subsequent fallout (Paulo)
- Fixup off-by-one in our hsw id table (Kenneth)
- Fixup ilk rc6 support (disabled by default), regression introduced in
  3.8
- g4x plane w/a from Egbert Eich
- gen2/3/4 dpms suspend/standy fixes for VGA outputs from Patrik Jakobsson
- Workaround dying ivb machines with less aggressive rc6 values (Stéphane
  Marchesin)

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Turn off hsync and vsync on ADPA when disabling crt
  drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits
  drm/i915: also disable south interrupts when handling them
  drm/i915: enable irqs earlier when resuming
  drm/i915: Increase the RC6p threshold.
  DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.
  drm/i915: Fix Haswell/CRW PCI IDs.
  drm/i915: Don't clobber crtc->fb when queue_flip fails
  drm/i915: wait_event_timeout's timeout is in jiffies
  drm/i915: Fix missing variable initilization

ARM: 7667/1: perf: Fix section mismatch on armpmu_init()

WARNING: vmlinux.o(.text+0xfb80): Section mismatch in reference
from the function armpmu_register() to the function
.init.text:armpmu_init()
The function armpmu_register() references
the function __init armpmu_init().
This is often because armpmu_register lacks a __init
annotation or the annotation of armpmu_init is wrong.

Just drop the __init marking on armpmu_init() because
armpmu_register() no longer has an __init marking.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7666/1: decompressor: add -mno-single-pic-base for building the decompressor

Before jumping to (position independent) C-code from the decompressor's
assembler world we set-up the C environment. This setup currently does not
set r9, which for arm-none-uclinux-uclibceabi toolchains is by default
expected to be the PIC offset base register (IE should point to the
beginning of the GOT).

Currently, therefore, in order to build working kernels that use the
decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
similar. uClinux toolchains cause a prefetch abort to occur at the beginning
of the decompress_kernel function.

This patch allows uClinux toolchains to build bootable zImages by forcing
the -mno-single-pic-base option, which ensures that the location of the GOT
is re-derived each time it is required, and r9 becomes free for use as a
general purpose register.

This has a small (4% in instruction terms) advantage over the alternative of
setting r9 to point to the GOT before calling into the C-world.

Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge branch 'pm-fixes' into fixes

* pm-fixes:
  cpufreq / intel_pstate: Do not load on VM that does not report max P state.
  cpufreq / intel_pstate: Fix intel_pstate_init() error path
  PM / QoS: Remove device PM QoS sysfs attributes at the right place
  PM / QoS: Fix concurrency issues and memory leaks in device PM QoS
  cpufreq: highbank: do not initialize array with a loop
  PM / OPP: improve introductory documentation
  cpufreq: Fix a typo in comment
  mailbox, pl320-ipc: remove __init from probe function

Merge branch 'acpi-fixes' into fixes

* acpi-fixes:
  ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
  ACPI / glue: Add .match() callback to struct acpi_bus_type
  ACPI / porocessor: Beautify code, pr->id is u32 which is never < 0
  ACPI / processor: Remove redundant NULL check before kfree
  ACPI / Sleep: Avoid interleaved message on errors

cpufreq / intel_pstate: Do not load on VM that does not report max P state.

It seems some VMs support the P state MSRs but return zeros. Fail
gracefully if we are running in this environment.

References: https://bugzilla.redhat.com/show_bug.cgi?id=916833
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq / intel_pstate: Fix intel_pstate_init() error path

If cpufreq_register_driver() fails just free memory that has been
allocated and return. intel_pstate_exit() function is removed since we
are built-in only now there is no reason for a module exit procedure.

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

drm/i915: Turn off hsync and vsync on ADPA when disabling crt

According to PRM we need to disable hsync and vsync even though ADPA is
disabled. The previous code did infact do the opposite so we fix it.

Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56359
Tested-by: max <manikulin@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

efivarfs: return accurate error code in efivarfs_fill_super()

Joseph was hitting a failure case when mounting efivarfs which
resulted in an incorrect error message,

$ sudo mount -v /sys/firmware/efi/efivars mount: Cannot allocate memory

triggered when efivarfs_valid_name() returned -EINVAL.

Make sure we pass accurate return values up the stack if
efivarfs_fill_super() fails to build inodes for EFI variables.

Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

efivars: efivarfs_valid_name() should handle pstore syntax

Stricter validation was introduced with commit da27a24383b2b
("efivarfs: guid part of filenames are case-insensitive") and commit
47f531e8ba3b ("efivarfs: Validate filenames much more aggressively"),
which is necessary for the guid portion of efivarfs filenames, but we
don't need to be so strict with the first part, the variable name. The
UEFI specification doesn't impose any constraints on variable names
other than they be a NULL-terminated string.

The above commits caused a regression that resulted in users seeing
the following message,

  $ sudo mount -v /sys/firmware/efi/efivars mount: Cannot allocate memory

whenever pstore EFI variables were present in the variable store,
since their variable names failed to pass the following check,

    /* GUID should be right after the first '-' */
    if (s - 1 != strchr(str, '-'))

as a typical pstore filename is of the form, dump-type0-10-1-<guid>.
The fix is trivial since the guid portion of the filename is GUID_LEN
bytes, we can use (len - GUID_LEN) to ensure the '-' character is
where we expect it to be.

(The bogus ENOMEM error value will be fixed in a separate patch.)

Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

efi: be more paranoid about available space when creating variables

UEFI variables are typically stored in flash. For various reasons, avaiable
space is typically not reclaimed immediately upon the deletion of a
variable - instead, the system will garbage collect during initialisation
after a reboot.

Some systems appear to handle this garbage collection extremely poorly,
failing if more than 50% of the system flash is in use. This can result in
the machine refusing to boot. The safest thing to do for the moment is to
forbid writes if they'd end up using more than half of the storage space.
We can make this more finegrained later if we come up with a method for
identifying the broken machines.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits

Disable bits for ADPA HSYNC and VSYNC where mixed up resulting in suspend
becoming standby and vice versa. Fixed by swapping their bit position.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

iommu, x86: Add DMA remap fault reason

The number of DMA fault reasons in intel's document are from 1
to 0xD, but in dmar.c fault reason 0xD is not printed out.

In this document:

"Intel Virtualization Technology for Directed I/O Architecture Specification"
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf

Chapter 4. Support For Device-IOTLBs

Table 6. Unsuccessful Translated Requests

There is fault reason for 0xD not listed in kernel:

    Present context-entry used to process translation request
    specifies blocking of Translation Requests (Translation Type (T)
    field value not equal to 01b).

This patch adds reason 0xD as well.

Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Link: http://lkml.kernel.org/r/1362537797-6034-1-git-send-email-zhen-hual@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fixes from Ben Herrenschmidt:
"Here are a few powerpc bits & fixes for rc1.  A couple of str*cpy
  fixes, some fixes in handling the FSCR register on Power8 (controls
  the enabling of processor features), a 32-bit build fix and a couple
  more nits."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Set DSCR bit in FSCR setup
  powerpc: Add DSCR FSCR register bit definition
  powerpc: Fix setting FSCR for HV=0 and on secondary CPUs
  powerpc: Wireup the kcmp syscall to sys_ni
  powerpc: Remove unused BITOP_LE_SWIZZLE macro
  powerpc: Avoid link stack corruption in MMU on syscall entry path
  drivers/tty/hvc: Use strlcpy instead of strncpy
  powerpc/pseries/hvcserver: Fix strncpy buffer limit in location code
  powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio hwrng fix from Rusty Russell:
"Nasty side-effect of vmalloc'ing modules: their static vars cannot be
  put into scatterlists.  Jens has a check queued for this, so it
  shouldn't happen again.

  We could fix this in virtio_rng, but it's actually far easier to just
  do it in the core"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  hw_random: make buffer usable in scatterlist.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"A moderately sized pile of fixes, some specifically for merge window
  introduced regressions although others are for longer standing items
  and have been queued up for -stable.

  I'm kind of tired of all the RDS protocol bugs over the years, to be
  honest, it's way out of proportion to the number of people who
  actually use it.

   1) Fix missing range initialization in netfilter IPSET, from Jozsef
      Kadlecsik.

   2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
      Berg.

   3) Fix DMA syncing in SFC driver, from Ben Hutchings.

   4) Fix regression in BOND device MAC address setting, from Jiri
      Pirko.

   5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

   6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
      fix from Dmitry Kravkov.

   7) Missing cfgspace_lock initialization in BCMA driver.

   8) Validate parameter size for SCTP assoc stats getsockopt(), from
      Guenter Roeck.

   9) Fix SCTP association hangs, from Lee A Roberts.

  10) Fix jumbo frame handling in r8169, from Francois Romieu.

  11) Fix phy_device memory leak, from Petr Malat.

  12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
      Mehrtens.

  13) Missing socket refcount release in L2TP, from Guillaume Nault.

  14) sctp_endpoint_init should respect passed in gfp_t, rather than use
      GFP_KERNEL unconditionally.  From Dan Carpenter.

  15) Add AISX AX88179 USB driver, from Freddy Xin.

  16) Remove MAINTAINERS entries for drivers deleted during the merge
      window, from Cesar Eduardo Barros.

  17) RDS protocol can try to allocate huge amounts of memory, check
      that the user's request length makes sense, from Cong Wang.

  18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
      bogus, definition.  From Cong Wang.

  19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
      from Frank Li.  Also, fix a build error introduced in the merge
      window.

  20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

  21) Don't double count RTT measurements when we leave the TCP receive
      fast path, from Neal Cardwell."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: fix double-counted receiver RTT when leaving receiver fast path
  CAIF: fix sparse warning for caif_usb
  rds: simplify a warning message
  net: fec: fix build error in no MXC platform
  net: ipv6: Don't purge default router if accept_ra=2
  net: fec: put tx to napi poll function to fix dead lock
  sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
  rds: limit the size allocated by rds_message_alloc()
  MAINTAINERS: remove eexpress
  MAINTAINERS: remove drivers/net/wan/cycx*
  MAINTAINERS: remove 3c505
  caif_dev: fix sparse warnings for caif_flow_cb
  ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
  sctp: use the passed in gfp flags instead GFP_KERNEL
  ipv[4|6]: correct dropwatch false positive in local_deliver_finish
  l2tp: Restore socket refcount when sendmsg succeeds
  net/phy: micrel: Disable asymmetric pause for KSZ9021
  bgmac: omit the fcs
  phy: Fix phy_device_free memory leak
  bnx2x: Fix KR2 work-around condition
  ...

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes and cleanups from Thomas Gleixner:
"Commit e5ab012c3271 ("nohz: Make tick_nohz_irq_exit() irq safe") is
  the first commit in the series and the minimal necessary bugfix, which
  needs to go back into stable.

  The remanining commits enforce irq disabling in irq_exit(), sanitize
  the hardirq/softirq preempt count transition and remove a bunch of no
  longer necessary conditionals."

I personally love getting rid of the very subtle and confusing
IRQ_EXIT_OFFSET thing.  Even apart from the whole "more lines removed
than added" thing.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq: Don't re-enable interrupts at the end of irq_exit
  irq: Remove IRQ_EXIT_OFFSET workaround
  Revert "nohz: Make tick_nohz_irq_exit() irq safe"
  irq: Sanitize invoke_softirq
  irq: Ensure irq_exit() code runs with interrupts disabled
  nohz: Make tick_nohz_irq_exit() irq safe

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smpboot bugfix from Thomas Gleixner:
"A single bugfix for a regression introduced with the conversion of the
  stop machine threads to the generic smpboot thread management
  facility"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine: Mark per cpu stopper enabled early

Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux

Pull second round of GPIO changes from Grant Likely:
"This branch contains a few bug fixes that I missed the first time
  around and updates to the gpio_desc series included in the first pull
  request.  This tag has been retagged to drop the 2 head commits
  because the one of them caused a build failure."

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux:
  gpio/gpio-ich: fix ichx_gpio_check_available() return what callers expect
  gpiolib: move comment to right function
  gpiolib: use const parameters when possible
  gpiolib: check descriptors validity before use

Merge tag 'md-3.9' of git://neil.brown.name/md

Pull md updates from NeilBrown:
"Mostly little bugfixes.

  Only "feature" is a new RAID10 layout which slightly improves the
  number of sets of devices that can concurrently fail, without data
  loss."

* tag 'md-3.9' of git://neil.brown.name/md:
  md: expedite metadata update when switching  read-auto -> active
  md: remove CONFIG_MULTICORE_RAID456
  md/raid1,raid10: fix deadlock with freeze_array()
  md/raid0: improve error message when converting RAID4-with-spares to RAID0
  md: raid0: fix error return from create_stripe_zones.
  md: fix two bugs when attempting to resize RAID0 array.
  DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
  MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2)
  MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
  MD RAID10: Minor non-functional code changes
  md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
  md: protect against crash upon fsync on ro array

x86, smpboot: Remove unused variable

The cpuinfo_x86 ptr is unused now. Drop it. Got obsolete by 69fb3676df33
("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
removing its only user.

[ hpa: fixes gcc warning ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428180-8865-2-git-send-email-bp@alien8.de
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

drm/i915: also disable south interrupts when handling them

From the docs:

  "IIR can queue up to two interrupt events. When the IIR is cleared,
  it will set itself again after one clock if a second event was
  stored."

  "Only the rising edge of the PCH Display interrupt will cause the
  North Display IIR (DEIIR) PCH Display Interrupt even bit to be set,
  so all PCH Display Interrupts, including back to back interrupts,
  must be cleared before a new PCH Display interrupt can cause DEIIR
  to be set".

The current code works fine because we don't get many interrupts, but
if we enable the PCH FIFO underrun interrupts we'll start getting so
many interrupts that at some point new PCH interrupts won't cause
DEIIR to be set.

The initial implementation I tried was to turn the code that checks
SDEIIR into a loop, but we can still get interrupts even after the
loop is done (and before the irq handler finishes), so we have to
either disable the interrupts or mask them. In the end I concluded
that just disabling the PCH interrupts is enough, you don't even need
the loop, so this is what this patch implements. I've tested it and it
passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce:
the "ironlake_crtc_disable" case and the "wrong watermarks" case.

In other words, here's how to reproduce the problem fixed by this
patch:
  1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+)
  2 - Boot the machine
  3 - While booting we'll get tons of PCH FIFO underrun interrupts
  4 - Plug a new monitor
  5 - Run xrandr, notice it won't detect the new monitor
  6 - Read SDEIIR and notice it's not 0 while DEIIR is 0

Q: Can't we just clear DEIIR before SDEIIR?
A: It doesn't work. SDEIIR has to be completely cleared (including the
interrupts stored on its back queue) before it can flip DEIIR's bit to
1 again, and even while you're clearing it you'll be getting more and
more interrupts.

Q: Why does it work by just disabling+enabling the south interrupts?
A: Because when we re-enable them, if there's something on the SDEIIR
register (maybe an interrupt stored on the queue), the re-enabling
will make DEIIR's bit flip to 1, and since we'll already have
interrupts enabled we'll get another interrupt, then run our irq
handler again to process the "back" interrupts.

v2: Even bigger commit message, added code comments.

Note that this fixes missed dp aux irqs which have been reported for
3.9-rc1. This regression has been introduced by switching to
irq-driven dp aux transactions with

commit 9ee32fea5fe810ec06af3a15e4c65478de56d4f5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Dec 1 13:53:48 2012 +0100

    drm/i915: irq-drive the dp aux communication

References: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18588.html
References: https://lkml.org/lkml/2013/2/26/769
Tested-by: Imre Deak <imre.deak@intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: Pimp commit message with references for the dp aux irq
timeout regression this fixes.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>