[PATCH] powerpc/kernel: improve FP and vector registers restoration

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] powerpc/kernel: improve FP and vector registers restoration

Breno Leitao-2
Currently tsk->thread->load_vec and load_fp are not initialized during a
task creation, which set garbage to these variables (non-zero value).

These variables will be checked later at restore_math() to validate if the
FP and vectors are being utilized. Since these values might be non-zero,
the restore_math() will continue to save the FP and vectors even if they
were never utilized before the userspace application. load_fp and load_vec
counters will then overflow and the FP and Altivec will be finally
disabled, but before that condition is reached (counter overflow) several
context switches restored FP and vector registers without need, causing a
performance degradation.

Signed-off-by: Breno Leitao <[hidden email]>
Signed-off-by: Gustavo Romero <[hidden email]>
---
 arch/powerpc/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..a9435397eab8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1666,6 +1666,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
 #ifdef CONFIG_VSX
  current->thread.used_vsr = 0;
 #endif
+ current->thread.load_fp = 0;
  memset(&current->thread.fp_state, 0, sizeof(current->thread.fp_state));
  current->thread.fp_save_area = NULL;
 #ifdef CONFIG_ALTIVEC
@@ -1674,6 +1675,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
  current->thread.vr_save_area = NULL;
  current->thread.vrsave = 0;
  current->thread.used_vr = 0;
+ current->thread.load_vec = 0;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_SPE
  memset(current->thread.evr, 0, sizeof(current->thread.evr));
--
2.11.0

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

Anton Blanchard
Hi Breno,

> Currently tsk->thread->load_vec and load_fp are not initialized
> during a task creation, which set garbage to these variables
> (non-zero value).

Nice catch! It seems like we should zero load_tm too though?

Acked-by: Anton Blanchard <[hidden email]>

Anton

> These variables will be checked later at restore_math() to validate
> if the FP and vectors are being utilized. Since these values might be
> non-zero, the restore_math() will continue to save the FP and vectors
> even if they were never utilized before the userspace application.
> load_fp and load_vec counters will then overflow and the FP and
> Altivec will be finally disabled, but before that condition is
> reached (counter overflow) several context switches restored FP and
> vector registers without need, causing a performance degradation.
>
> Signed-off-by: Breno Leitao <[hidden email]>
> Signed-off-by: Gustavo Romero <[hidden email]>
> ---
>  arch/powerpc/kernel/process.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c index baae104b16c7..a9435397eab8
> 100644 --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1666,6 +1666,7 @@ void start_thread(struct pt_regs *regs,
> unsigned long start, unsigned long sp) #ifdef CONFIG_VSX
>   current->thread.used_vsr = 0;
>  #endif
> + current->thread.load_fp = 0;
>   memset(&current->thread.fp_state, 0,
> sizeof(current->thread.fp_state)); current->thread.fp_save_area =
> NULL; #ifdef CONFIG_ALTIVEC
> @@ -1674,6 +1675,7 @@ void start_thread(struct pt_regs *regs,
> unsigned long start, unsigned long sp) current->thread.vr_save_area =
> NULL; current->thread.vrsave = 0;
>   current->thread.used_vr = 0;
> + current->thread.load_vec = 0;
>  #endif /* CONFIG_ALTIVEC */
>  #ifdef CONFIG_SPE
>   memset(current->thread.evr, 0, sizeof(current->thread.evr));

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

Breno Leitao-2
Hi Anton,

On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> Hi Breno,
>
> > Currently tsk->thread->load_vec and load_fp are not initialized
> > during a task creation, which set garbage to these variables
> > (non-zero value).
>
> Nice catch! It seems like we should zero load_tm too though?

Yes, it seems we need to zero load_tm also, since it does not seem to be
zeroed anywhere else.

But I did some tests, and load_tm is always zero after start_thread()
is being called.

In fact, start_thread() is being called and pt_regs->load_tm is already
zero since the function start.

I also wrote a SystemTap script[1] to investigate it better, and I've
never seen a single load_tm != 0 in a my machine. I tested on both
POWER8 bare metal and KVM guests. (load_vec and load_fp happened to have
garbage all the time)

Any idea if this is just occasional event, or, if there is someone
zeroing it in an obscure code?

[1] https://github.com/leitao/htm_torture/blob/master/systemtap/load_tm_at_start_thread.stap
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

Anton Blanchard
On Sat, 3 Jun 2017 19:42:14 -0300
Breno Leitao <[hidden email]> wrote:

> Hi Anton,
>
> On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> > Hi Breno,
> >  
> > > Currently tsk->thread->load_vec and load_fp are not initialized
> > > during a task creation, which set garbage to these variables
> > > (non-zero value).  
> >
> > Nice catch! It seems like we should zero load_tm too though?  
>
> Yes, it seems we need to zero load_tm also, since it does not seem to
> be zeroed anywhere else.
>
> But I did some tests, and load_tm is always zero after start_thread()
> is being called.
>
> In fact, start_thread() is being called and pt_regs->load_tm is
> already zero since the function start.
>
> I also wrote a SystemTap script[1] to investigate it better, and I've
> never seen a single load_tm != 0 in a my machine. I tested on both
> POWER8 bare metal and KVM guests. (load_vec and load_fp happened to
> have garbage all the time)
>
> Any idea if this is just occasional event, or, if there is someone
> zeroing it in an obscure code?

Quite likely no one uses TM :) Try:

#include <unistd.h>

int main(void)
{
        __builtin_tbegin(0);
        execlp("/bin/true", "/bin/true", NULL);
}

Anton
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

Breno Leitao-2
On Sun, Jun 04, 2017 at 11:38:14AM +1000, Anton Blanchard wrote:

> On Sat, 3 Jun 2017 19:42:14 -0300
> Breno Leitao <[hidden email]> wrote:
>
> > Hi Anton,
> >
> > On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> > > Hi Breno,
> > >  
> > > > Currently tsk->thread->load_vec and load_fp are not initialized
> > > > during a task creation, which set garbage to these variables
> > > > (non-zero value).  
> > >
> > > Nice catch! It seems like we should zero load_tm too though?  
> >
> > Yes, it seems we need to zero load_tm also, since it does not seem to
> > be zeroed anywhere else.
> >
> > But I did some tests, and load_tm is always zero after start_thread()
> > is being called.
> >
> > In fact, start_thread() is being called and pt_regs->load_tm is
> > already zero since the function start.
> >
> > I also wrote a SystemTap script[1] to investigate it better, and I've
> > never seen a single load_tm != 0 in a my machine. I tested on both
> > POWER8 bare metal and KVM guests. (load_vec and load_fp happened to
> > have garbage all the time)
> >
> > Any idea if this is just occasional event, or, if there is someone
> > zeroing it in an obscure code?
>
> Quite likely no one uses TM :) Try:

In fact, I had tested with TM[1] and haven't seen any issue, but I was not
calling a nested application (through execve() syscall). Somehow if I
call  "$ ./tm_application ; /bin/true", I do not see a non-zero load_tm
in the new task->thread.

On the other side, I see the corruption with your test case, mainly if I
sleep after 'tbegin.' and before execlp(), giving a chance to have
load_tm incremented, and this value is being inherited in the new
task->thread.

This is obviously wrong, I will send a patch to have it fixed.

Thanks for the guidance!

[1] https://github.com/leitao/htm_torture
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

Michael Ellerman-2
In reply to this post by Breno Leitao-2
Breno Leitao <[hidden email]> writes:

> Currently tsk->thread->load_vec and load_fp are not initialized during a
> task creation, which set garbage to these variables (non-zero value).
>
> These variables will be checked later at restore_math() to validate if the
> FP and vectors are being utilized. Since these values might be non-zero,
> the restore_math() will continue to save the FP and vectors even if they
> were never utilized before the userspace application. load_fp and load_vec
> counters will then overflow and the FP and Altivec will be finally
> disabled, but before that condition is reached (counter overflow) several
> context switches restored FP and vector registers without need, causing a
> performance degradation.
>
> Signed-off-by: Breno Leitao <[hidden email]>
> Signed-off-by: Gustavo Romero <[hidden email]>

Thanks, I tweaked the wording a little and added:

Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: [hidden email] # v4.6+

cheers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: powerpc/kernel: improve FP and vector registers restoration

Michael Ellerman-3
In reply to this post by Breno Leitao-2
On Fri, 2017-06-02 at 21:43:30 UTC, Breno Leitao wrote:

> Currently tsk->thread->load_vec and load_fp are not initialized during a
> task creation, which set garbage to these variables (non-zero value).
>
> These variables will be checked later at restore_math() to validate if the
> FP and vectors are being utilized. Since these values might be non-zero,
> the restore_math() will continue to save the FP and vectors even if they
> were never utilized before the userspace application. load_fp and load_vec
> counters will then overflow and the FP and Altivec will be finally
> disabled, but before that condition is reached (counter overflow) several
> context switches restored FP and vector registers without need, causing a
> performance degradation.
>
> Signed-off-by: Breno Leitao <[hidden email]>
> Signed-off-by: Gustavo Romero <[hidden email]>
> Acked-by: Anton Blanchard <[hidden email]>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/1195892c091a15cc862f4e202482a3

cheers
Loading...