[PATCH 0/8] Support for 24x7 hcall interface version 2

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/8] Support for 24x7 hcall interface version 2

Thiago Jung Bauermann
Hello,

The hypervisor interface to access 24x7 performance counters (which collect
performance information from system power on to system power off) has been
extended in POWER9 adding new fields to the request and result element
structures.

Also, results for some domains now return more than one result element and
those need to be added to get a total count.

The first two patches fix bugs in the existing code. The following 4
patches are code improvements and the last two finally implement support
for the changes in POWER9 described above.

POWER8 systems only support version 1 of the interface, while POWER9
systems only support version 2. I tested these patches on POWER8 to verify
that there are no regressions, and also on POWER9 DD1.

Thiago Jung Bauermann (8):
  powerpc/perf/hv-24x7: Fix passing of catalog version number
  powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check
  powerpc/perf/hv-24x7: Properly iterate through results
  powerpc-perf/hx-24x7: Don't log failed hcall twice
  powerpc/perf/hv-24x7: Fix return value of hcalls
  powerpc/perf/hv-24x7: Minor improvements
  powerpc/perf/hv-24x7: Support v2 of the hypervisor API
  powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8

 arch/powerpc/perf/hv-24x7.c            | 255 +++++++++++++++++++++++++--------
 arch/powerpc/perf/hv-24x7.h            |  70 +++++++--
 arch/powerpc/platforms/pseries/Kconfig |   2 +-
 3 files changed, 255 insertions(+), 72 deletions(-)

--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/8] powerpc/perf/hv-24x7: Fix passing of catalog version number

Thiago Jung Bauermann
H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from
the first catalog page obtained previously. This is a 64 bit number, but
create_events_from_catalog truncates it to 32-bit.

This worked on POWER8, but POWER9 actually uses the upper bits so the call
fails with H_P3 because the hypervisor doesn't recognize the version.

This patch also adds the hcall return code to the error message, which is
helpful when debugging the problem.

Fixes: 5c5cd7b50259 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events")
Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 7b2ca16b1eb4..1354cde7095c 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -670,7 +670,7 @@ static int create_events_from_catalog(struct attribute ***events_,
        event_data_bytes, junk_events, event_idx, event_attr_ct, i,
        attr_max, event_idx_last, desc_ct, long_desc_ct;
  ssize_t ct, ev_len;
- uint32_t catalog_version_num;
+ uint64_t catalog_version_num;
  struct attribute **events, **event_descs, **event_long_descs;
  struct hv_24x7_catalog_page_0 *page_0 =
  kmem_cache_alloc(hv_page_cache, GFP_KERNEL);
@@ -706,8 +706,8 @@ static int create_events_from_catalog(struct attribute ***events_,
  event_data_offs   = be16_to_cpu(page_0->event_data_offs);
  event_data_len    = be16_to_cpu(page_0->event_data_len);
 
- pr_devel("cv %zu cl %zu eec %zu edo %zu edl %zu\n",
- (size_t)catalog_version_num, catalog_len,
+ pr_devel("cv %llu cl %zu eec %zu edo %zu edl %zu\n",
+ catalog_version_num, catalog_len,
  event_entry_count, event_data_offs, event_data_len);
 
  if ((MAX_4K < event_data_len)
@@ -761,8 +761,8 @@ static int create_events_from_catalog(struct attribute ***events_,
  catalog_version_num,
  i + event_data_offs);
  if (hret) {
- pr_err("failed to get event data in page %zu\n",
- i + event_data_offs);
+ pr_err("Failed to get event data in page %zu: rc=%ld\n",
+       i + event_data_offs, hret);
  ret = -EIO;
  goto e_event_data;
  }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 2/8] powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
request_buffer can hold 254 requests, so if it already has that number of
entries we can't add a new one.

Also, define constant to show where the number comes from.

Fixes: e3ee15dc5d19 ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()")
Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---

Notes:
    This patch was already posted before:
   
    https://patchwork.ozlabs.org/patch/735948/

 arch/powerpc/perf/hv-24x7.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 1354cde7095c..141de0f708f7 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -166,6 +166,10 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
 DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 
+#define MAX_NUM_REQUESTS ((H24x7_DATA_BUFFER_SIZE -       \
+ sizeof(struct hv_24x7_request_buffer)) \
+ / sizeof(struct hv_24x7_request))
+
 static char *event_name(struct hv_24x7_event_data *ev, int *len)
 {
  *len = be16_to_cpu(ev->event_name_len) - 2;
@@ -1107,7 +1111,7 @@ static int add_event_to_24x7_request(struct perf_event *event,
  int i;
  struct hv_24x7_request *req;
 
- if (request_buffer->num_requests > 254) {
+ if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
  pr_devel("Too many requests for 24x7 HCALL %d\n",
  request_buffer->num_requests);
  return -EINVAL;
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 3/8] powerpc/perf/hv-24x7: Properly iterate through results

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
hv-24x7.h has a comment mentioning that result_buffer->results can't be
indexed as a normal array because it may contain results of variable sizes,
so fix the loop in h_24x7_event_commit_txn to take the variation into
account when iterating through results.

Another problem in that loop is that it sets h24x7hw->events[i] to NULL.
This assumes that only the i'th result maps to the i'th request, but that
is not guaranteed to be true. We need to leave the event in the array so
that we don't dereference a NULL pointer in case more than one result maps
to one request.

We still assume that each result has only one result element, so warn if
that assumption is violated.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 141de0f708f7..8b1a3e849d23 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1144,7 +1144,9 @@ static int add_event_to_24x7_request(struct perf_event *event,
 
 static unsigned long single_24x7_request(struct perf_event *event, u64 *count)
 {
+ u16 num_elements;
  unsigned long ret;
+ struct hv_24x7_result *result;
  struct hv_24x7_request_buffer *request_buffer;
  struct hv_24x7_data_result_buffer *result_buffer;
 
@@ -1166,8 +1168,14 @@ static unsigned long single_24x7_request(struct perf_event *event, u64 *count)
  goto out;
  }
 
+ result = result_buffer->results;
+
+ /* This code assumes that a result has only one element. */
+ num_elements = be16_to_cpu(result->num_elements_returned);
+ WARN_ON_ONCE(num_elements != 1);
+
  /* process result from hcall */
- *count = be64_to_cpu(result_buffer->results[0].elements[0].element_data[0]);
+ *count = be64_to_cpu(result->elements[0].element_data[0]);
 
 out:
  put_cpu_var(hv_24x7_reqb);
@@ -1400,8 +1408,7 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
 {
  struct hv_24x7_request_buffer *request_buffer;
  struct hv_24x7_data_result_buffer *result_buffer;
- struct hv_24x7_result *resb;
- struct perf_event *event;
+ struct hv_24x7_result *res, *next_res;
  u64 count;
  int i, ret, txn_flags;
  struct hv_24x7_hw *h24x7hw;
@@ -1428,13 +1435,20 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
 
  h24x7hw = &get_cpu_var(hv_24x7_hw);
 
- /* Update event counts from hcall */
- for (i = 0; i < request_buffer->num_requests; i++) {
- resb = &result_buffer->results[i];
- count = be64_to_cpu(resb->elements[0].element_data[0]);
- event = h24x7hw->events[i];
- h24x7hw->events[i] = NULL;
+ /* Go through results in the result buffer to update event counts. */
+ for (i = 0, res = result_buffer->results;
+     i < result_buffer->num_results; i++, res = next_res) {
+ struct perf_event *event = h24x7hw->events[res->result_ix];
+ u16 num_elements = be16_to_cpu(res->num_elements_returned);
+ u16 data_size = be16_to_cpu(res->result_element_data_size);
+
+ /* This code assumes that a result has only one element. */
+ WARN_ON_ONCE(num_elements != 1);
+
+ count = be64_to_cpu(res->elements[0].element_data[0]);
  update_event_count(event, count);
+
+ next_res = (void *) res->elements[0].element_data + data_size;
  }
 
  put_cpu_var(hv_24x7_hw);
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 4/8] powerpc-perf/hx-24x7: Don't log failed hcall twice

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
make_24x7_request already calls log_24x7_hcall if it fails, so callers
don't have to do it again.

In fact, since the latter is now only called from the former, there's no
need for a separate log_24x7_hcall anymore so remove it.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 8b1a3e849d23..111c61ea7416 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1044,21 +1044,6 @@ static const struct attribute_group *attr_groups[] = {
  NULL,
 };
 
-static void log_24x7_hcall(struct hv_24x7_request_buffer *request_buffer,
-   struct hv_24x7_data_result_buffer *result_buffer,
-   unsigned long ret)
-{
- struct hv_24x7_request *req;
-
- req = &request_buffer->requests[0];
- pr_notice_ratelimited("hcall failed: [%d %#x %#x %d] => "
- "ret 0x%lx (%ld) detail=0x%x failing ix=%x\n",
- req->performance_domain, req->data_offset,
- req->starting_ix, req->starting_lpar_ix, ret, ret,
- result_buffer->detailed_rc,
- result_buffer->failing_request_ix);
-}
-
 /*
  * Start the process for a new H_GET_24x7_DATA hcall.
  */
@@ -1091,8 +1076,16 @@ static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
  virt_to_phys(request_buffer), H24x7_DATA_BUFFER_SIZE,
  virt_to_phys(result_buffer),  H24x7_DATA_BUFFER_SIZE);
 
- if (ret)
- log_24x7_hcall(request_buffer, result_buffer, ret);
+ if (ret) {
+ struct hv_24x7_request *req;
+
+ req = &request_buffer->requests[0];
+ pr_notice_ratelimited("hcall failed: [%d %#x %#x %d] => ret 0x%lx (%ld) detail=0x%x failing ix=%x\n",
+      req->performance_domain, req->data_offset,
+      req->starting_ix, req->starting_lpar_ix,
+      ret, ret, result_buffer->detailed_rc,
+      result_buffer->failing_request_ix);
+ }
 
  return ret;
 }
@@ -1163,10 +1156,8 @@ static unsigned long single_24x7_request(struct perf_event *event, u64 *count)
  goto out;
 
  ret = make_24x7_request(request_buffer, result_buffer);
- if (ret) {
- log_24x7_hcall(request_buffer, result_buffer, ret);
+ if (ret)
  goto out;
- }
 
  result = result_buffer->results;
 
@@ -1428,10 +1419,8 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
  result_buffer = (void *)get_cpu_var(hv_24x7_resb);
 
  ret = make_24x7_request(request_buffer, result_buffer);
- if (ret) {
- log_24x7_hcall(request_buffer, result_buffer, ret);
+ if (ret)
  goto put_reqb;
- }
 
  h24x7hw = &get_cpu_var(hv_24x7_hw);
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 5/8] powerpc/perf/hv-24x7: Fix return value of hcalls

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix
this in the code.

The H_GET_24X7_DATA hcall can return a signed error code, so fix this in
the code. Also, don't truncate it to 32 bit to use as return value for
make_24x7_request. In case of error h_24x7_event_commit_txn passes that
return value to generic code, so it should be a proper errno. The other
caller of make_24x7_request is single_24x7_request, whose callers don't
actually care which error code is returned so they are not affected by this
change.

Finally, h_24x7_get_value doesn't use the error code from
single_24x7_request, so there's no need to store it.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 111c61ea7416..7e47dcdea62a 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -264,9 +264,8 @@ static void *event_end(struct hv_24x7_event_data *ev, void *end)
  return start + nl + dl + ldl;
 }
 
-static unsigned long h_get_24x7_catalog_page_(unsigned long phys_4096,
-      unsigned long version,
-      unsigned long index)
+static long h_get_24x7_catalog_page_(unsigned long phys_4096,
+     unsigned long version, unsigned long index)
 {
  pr_devel("h_get_24x7_catalog_page(0x%lx, %lu, %lu)",
  phys_4096, version, index);
@@ -277,8 +276,7 @@ static unsigned long h_get_24x7_catalog_page_(unsigned long phys_4096,
  phys_4096, version, index);
 }
 
-static unsigned long h_get_24x7_catalog_page(char page[],
-     u64 version, u32 index)
+static long h_get_24x7_catalog_page(char page[], u64 version, u32 index)
 {
  return h_get_24x7_catalog_page_(virt_to_phys(page),
  version, index);
@@ -668,7 +666,7 @@ static int create_events_from_catalog(struct attribute ***events_,
       struct attribute ***event_descs_,
       struct attribute ***event_long_descs_)
 {
- unsigned long hret;
+ long hret;
  size_t catalog_len, catalog_page_len, event_entry_count,
        event_data_len, event_data_offs,
        event_data_bytes, junk_events, event_idx, event_attr_ct, i,
@@ -907,7 +905,7 @@ static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
     struct bin_attribute *bin_attr, char *buf,
     loff_t offset, size_t count)
 {
- unsigned long hret;
+ long hret;
  ssize_t ret = 0;
  size_t catalog_len = 0, catalog_page_len = 0;
  loff_t page_offset = 0;
@@ -992,7 +990,7 @@ static ssize_t _name##_show(struct device *dev, \
     struct device_attribute *dev_attr, \
     char *buf) \
 { \
- unsigned long hret; \
+ long hret; \
  ssize_t ret = 0; \
  void *page = kmem_cache_alloc(hv_page_cache, GFP_USER); \
  struct hv_24x7_catalog_page_0 *page_0 = page; \
@@ -1065,7 +1063,7 @@ static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
 static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
      struct hv_24x7_data_result_buffer *result_buffer)
 {
- unsigned long ret;
+ long ret;
 
  /*
  * NOTE: Due to variable number of array elements in request and
@@ -1085,9 +1083,10 @@ static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
       req->starting_ix, req->starting_lpar_ix,
       ret, ret, result_buffer->detailed_rc,
       result_buffer->failing_request_ix);
+ return -EIO;
  }
 
- return ret;
+ return 0;
 }
 
 /*
@@ -1135,10 +1134,10 @@ static int add_event_to_24x7_request(struct perf_event *event,
  return 0;
 }
 
-static unsigned long single_24x7_request(struct perf_event *event, u64 *count)
+static int single_24x7_request(struct perf_event *event, u64 *count)
 {
+ int ret;
  u16 num_elements;
- unsigned long ret;
  struct hv_24x7_result *result;
  struct hv_24x7_request_buffer *request_buffer;
  struct hv_24x7_data_result_buffer *result_buffer;
@@ -1253,10 +1252,9 @@ static int h_24x7_event_init(struct perf_event *event)
 
 static u64 h_24x7_get_value(struct perf_event *event)
 {
- unsigned long ret;
  u64 ct;
- ret = single_24x7_request(event, &ct);
- if (ret)
+
+ if (single_24x7_request(event, &ct))
  /* We checked this in event init, shouldn't fail here... */
  return 0;
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 6/8] powerpc/perf/hv-24x7: Minor improvements

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request.

There's also an HV_PERF_DOMAIN_MAX constant, so use it in
h_24x7_event_init. This makes the comment above the check redundant,
so remove it.

In add_event_to_24x7_request, a statement is terminated with a comma
instead of a semicolon. Fix it.

In hv-24x7.h, improve comments in struct hv_24x7_result.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c |  9 ++++-----
 arch/powerpc/perf/hv-24x7.h | 11 ++++++++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 7e47dcdea62a..043cbc78be98 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1049,8 +1049,8 @@ static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
       struct hv_24x7_data_result_buffer *result_buffer)
 {
 
- memset(request_buffer, 0, 4096);
- memset(result_buffer, 0, 4096);
+ memset(request_buffer, 0, H24x7_DATA_BUFFER_SIZE);
+ memset(result_buffer, 0, H24x7_DATA_BUFFER_SIZE);
 
  request_buffer->interface_version = HV_24X7_IF_VERSION_CURRENT;
  /* memset above set request_buffer->num_requests to 0 */
@@ -1126,7 +1126,7 @@ static int add_event_to_24x7_request(struct perf_event *event,
  req->performance_domain = event_get_domain(event);
  req->data_size = cpu_to_be16(8);
  req->data_offset = cpu_to_be32(event_get_offset(event));
- req->starting_lpar_ix = cpu_to_be16(event_get_lpar(event)),
+ req->starting_lpar_ix = cpu_to_be16(event_get_lpar(event));
  req->max_num_lpars = cpu_to_be16(1);
  req->starting_ix = cpu_to_be16(idx);
  req->max_ix = cpu_to_be16(1);
@@ -1218,9 +1218,8 @@ static int h_24x7_event_init(struct perf_event *event)
  return -EINVAL;
  }
 
- /* Domains above 6 are invalid */
  domain = event_get_domain(event);
- if (domain > 6) {
+ if (domain > HV_PERF_DOMAIN_MAX) {
  pr_devel("invalid domain %d\n", domain);
  return -EINVAL;
  }
diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
index 634ef4082cdc..b95909400b2a 100644
--- a/arch/powerpc/perf/hv-24x7.h
+++ b/arch/powerpc/perf/hv-24x7.h
@@ -71,6 +71,10 @@ struct hv_24x7_result_element {
 } __packed;
 
 struct hv_24x7_result {
+ /*
+ * The index of the 24x7 Request Structure in the 24x7 Request Buffer
+ * used to request this result.
+ */
  __u8 result_ix;
 
  /*
@@ -81,7 +85,12 @@ struct hv_24x7_result {
  __u8 results_complete;
  __be16 num_elements_returned;
 
- /* This is a copy of @data_size from the corresponding hv_24x7_request */
+ /*
+ * This is a copy of @data_size from the corresponding hv_24x7_request
+ *
+ * Warning: to obtain the size of each element in @elements you have
+ * to add the size of the other members of the result_element struct.
+ */
  __be16 result_element_data_size;
  __u8 reserved[0x2];
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c            | 145 +++++++++++++++++++++++++++------
 arch/powerpc/perf/hv-24x7.h            |  59 ++++++++++++--
 arch/powerpc/platforms/pseries/Kconfig |   2 +-
 3 files changed, 173 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 043cbc78be98..95c44f1d2fd2 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -18,6 +18,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <asm/cputhreads.h>
 #include <asm/firmware.h>
 #include <asm/hvcall.h>
 #include <asm/io.h>
@@ -27,6 +28,9 @@
 #include "hv-24x7-catalog.h"
 #include "hv-common.h"
 
+/* Version of the 24x7 hypervisor API that we should use in this machine. */
+static int interface_version;
+
 static bool domain_is_valid(unsigned domain)
 {
  switch (domain) {
@@ -74,7 +78,11 @@ static const char *domain_name(unsigned domain)
 
 static bool catalog_entry_domain_is_valid(unsigned domain)
 {
- return is_physical_domain(domain);
+ /* POWER8 doesn't support virtual domains. */
+ if (interface_version == 1)
+ return is_physical_domain(domain);
+ else
+ return domain_is_valid(domain);
 }
 
 /*
@@ -166,9 +174,12 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
 DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 
-#define MAX_NUM_REQUESTS ((H24x7_DATA_BUFFER_SIZE -       \
+#define MAX_NUM_REQUESTS_V1 ((H24x7_DATA_BUFFER_SIZE -       \
+ sizeof(struct hv_24x7_request_buffer)) \
+ / H24x7_REQUEST_SIZE_V1)
+#define MAX_NUM_REQUESTS_V2 ((H24x7_DATA_BUFFER_SIZE -       \
  sizeof(struct hv_24x7_request_buffer)) \
- / sizeof(struct hv_24x7_request))
+ / H24x7_REQUEST_SIZE_V2)
 
 static char *event_name(struct hv_24x7_event_data *ev, int *len)
 {
@@ -1052,7 +1063,7 @@ static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
  memset(request_buffer, 0, H24x7_DATA_BUFFER_SIZE);
  memset(result_buffer, 0, H24x7_DATA_BUFFER_SIZE);
 
- request_buffer->interface_version = HV_24X7_IF_VERSION_CURRENT;
+ request_buffer->interface_version = interface_version;
  /* memset above set request_buffer->num_requests to 0 */
 }
 
@@ -1077,7 +1088,7 @@ static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
  if (ret) {
  struct hv_24x7_request *req;
 
- req = &request_buffer->requests[0];
+ req = request_buffer->requests;
  pr_notice_ratelimited("hcall failed: [%d %#x %#x %d] => ret 0x%lx (%ld) detail=0x%x failing ix=%x\n",
       req->performance_domain, req->data_offset,
       req->starting_ix, req->starting_lpar_ix,
@@ -1101,9 +1112,13 @@ static int add_event_to_24x7_request(struct perf_event *event,
 {
  u16 idx;
  int i;
+ size_t req_size;
  struct hv_24x7_request *req;
 
- if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
+ if ((request_buffer->interface_version == 1
+     && request_buffer->num_requests >= MAX_NUM_REQUESTS_V1)
+    || (request_buffer->interface_version > 1
+ && request_buffer->num_requests >= MAX_NUM_REQUESTS_V2)) {
  pr_devel("Too many requests for 24x7 HCALL %d\n",
  request_buffer->num_requests);
  return -EINVAL;
@@ -1120,8 +1135,11 @@ static int add_event_to_24x7_request(struct perf_event *event,
  idx = event_get_vcpu(event);
  }
 
+ req_size = request_buffer->interface_version == 1 ?
+   H24x7_REQUEST_SIZE_V1 : H24x7_REQUEST_SIZE_V2;
+
  i = request_buffer->num_requests++;
- req = &request_buffer->requests[i];
+ req = (void *) request_buffer->requests + i * req_size;
 
  req->performance_domain = event_get_domain(event);
  req->data_size = cpu_to_be16(8);
@@ -1131,14 +1149,97 @@ static int add_event_to_24x7_request(struct perf_event *event,
  req->starting_ix = cpu_to_be16(idx);
  req->max_ix = cpu_to_be16(1);
 
+ if (request_buffer->interface_version > 1 &&
+    req->performance_domain != HV_PERF_DOMAIN_PHYS_CHIP) {
+ req->starting_thread_group_ix = idx % 2;
+ req->max_num_thread_groups = 1;
+ }
+
  return 0;
 }
 
+/**
+ * get_count_from_result - get event count from the given result
+ *
+ * @event: Event associated with @res.
+ * @resb: Result buffer containing @res.
+ * @res: Result to work on.
+ * @countp: Output variable containing the event count.
+ * @next: Optional output variable pointing to the next result in @resb.
+ */
+static int get_count_from_result(struct perf_event *event,
+ struct hv_24x7_data_result_buffer *resb,
+ struct hv_24x7_result *res, u64 *countp,
+ struct hv_24x7_result **next)
+{
+ u16 num_elements = be16_to_cpu(res->num_elements_returned);
+ u16 data_size = be16_to_cpu(res->result_element_data_size);
+ unsigned int data_offset;
+ void *element_data;
+ int ret = 0;
+
+ /*
+ * We can bail out early if the result is empty.
+ */
+ if (!num_elements) {
+ pr_debug("Result of request %hhu is empty, nothing to do\n",
+ res->result_ix);
+
+ if (next)
+ *next = (struct hv_24x7_result *) res->elements;
+
+ return -ENODATA;
+ }
+
+ /*
+ * This code assumes that a result has only one element.
+ */
+ if (num_elements != 1) {
+ pr_debug("Error: result of request %hhu has %hu elements\n",
+ res->result_ix, num_elements);
+
+ if (!next)
+ return -ENOTSUPP;
+
+ /*
+ * We still need to go through the motions so that we can return
+ * a pointer to the next result.
+ */
+ ret = -ENOTSUPP;
+ }
+
+ if (data_size != sizeof(u64)) {
+ pr_debug("Error: result of request %hhu has data of %hu bytes\n",
+ res->result_ix, data_size);
+
+ if (!next)
+ return -ENOTSUPP;
+
+ ret = -ENOTSUPP;
+ }
+
+ if (resb->interface_version == 1)
+ data_offset = offsetof(struct hv_24x7_result_element_v1,
+       element_data);
+ else
+ data_offset = offsetof(struct hv_24x7_result_element_v2,
+       element_data);
+
+ element_data = res->elements + data_offset;
+
+ if (!ret)
+ *countp = be64_to_cpu(*((u64 *) element_data));
+
+ /* The next result is after the result element. */
+ if (next)
+ *next = element_data + data_size;
+
+ return ret;
+}
+
 static int single_24x7_request(struct perf_event *event, u64 *count)
 {
  int ret;
- u16 num_elements;
- struct hv_24x7_result *result;
  struct hv_24x7_request_buffer *request_buffer;
  struct hv_24x7_data_result_buffer *result_buffer;
 
@@ -1158,14 +1259,9 @@ static int single_24x7_request(struct perf_event *event, u64 *count)
  if (ret)
  goto out;
 
- result = result_buffer->results;
-
- /* This code assumes that a result has only one element. */
- num_elements = be16_to_cpu(result->num_elements_returned);
- WARN_ON_ONCE(num_elements != 1);
-
  /* process result from hcall */
- *count = be64_to_cpu(result->elements[0].element_data[0]);
+ ret = get_count_from_result(event, result_buffer,
+    result_buffer->results, count, NULL);
 
 out:
  put_cpu_var(hv_24x7_reqb);
@@ -1425,16 +1521,13 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
  for (i = 0, res = result_buffer->results;
      i < result_buffer->num_results; i++, res = next_res) {
  struct perf_event *event = h24x7hw->events[res->result_ix];
- u16 num_elements = be16_to_cpu(res->num_elements_returned);
- u16 data_size = be16_to_cpu(res->result_element_data_size);
 
- /* This code assumes that a result has only one element. */
- WARN_ON_ONCE(num_elements != 1);
+ ret = get_count_from_result(event, result_buffer, res, &count,
+    &next_res);
+ if (ret)
+ continue;
 
- count = be64_to_cpu(res->elements[0].element_data[0]);
  update_event_count(event, count);
-
- next_res = (void *) res->elements[0].element_data + data_size;
  }
 
  put_cpu_var(hv_24x7_hw);
@@ -1486,6 +1579,12 @@ static int hv_24x7_init(void)
  return -ENODEV;
  }
 
+ /* POWER8 only supports v1, while POWER9 only supports v2. */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ interface_version = 2;
+ else
+ interface_version = 1;
+
  hret = hv_perf_caps_get(&caps);
  if (hret) {
  pr_debug("could not obtain capabilities, not enabling, rc=%ld\n",
diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
index b95909400b2a..149af6e9538f 100644
--- a/arch/powerpc/perf/hv-24x7.h
+++ b/arch/powerpc/perf/hv-24x7.h
@@ -10,6 +10,9 @@ enum hv_perf_domains {
  HV_PERF_DOMAIN_MAX,
 };
 
+#define H24x7_REQUEST_SIZE_V1 16
+#define H24x7_REQUEST_SIZE_V2 32
+
 struct hv_24x7_request {
  /* PHYSICAL domains require enabling via phyp/hmc. */
  __u8 performance_domain;
@@ -42,19 +45,47 @@ struct hv_24x7_request {
  /* chip, core, or virtual processor based on @performance_domain */
  __be16 starting_ix;
  __be16 max_ix;
+
+ /* The following fields were added in v2 of the 24x7 interface. */
+
+ __u8 starting_thread_group_ix;
+
+ /* -1 means all thread groups starting at @starting_thread_group_ix */
+ __u8 max_num_thread_groups;
+
+ __u8 reserved2[0xE];
 } __packed;
 
 struct hv_24x7_request_buffer {
  /* 0 - ? */
  /* 1 - ? */
-#define HV_24X7_IF_VERSION_CURRENT 0x01
  __u8 interface_version;
  __u8 num_requests;
  __u8 reserved[0xE];
- struct hv_24x7_request requests[1];
+ struct hv_24x7_request requests[];
+} __packed;
+
+struct hv_24x7_result_element_v1 {
+ __be16 lpar_ix;
+
+ /*
+ * represents the core, chip, or virtual processor based on the
+ * request's @performance_domain
+ */
+ __be16 domain_ix;
+
+ /* -1 if @performance_domain does not refer to a virtual processor */
+ __be32 lpar_cfg_instance_id;
+
+ /* size = @result_element_data_size of containing result. */
+ __u64 element_data[];
 } __packed;
 
-struct hv_24x7_result_element {
+/*
+ * We need a separate struct for v2 because the offset of @element_data changed
+ * between versions.
+ */
+struct hv_24x7_result_element_v2 {
  __be16 lpar_ix;
 
  /*
@@ -66,8 +97,12 @@ struct hv_24x7_result_element {
  /* -1 if @performance_domain does not refer to a virtual processor */
  __be32 lpar_cfg_instance_id;
 
+ __u8 thread_group_ix;
+
+ __u8 reserved[7];
+
  /* size = @result_element_data_size of containing result. */
- __u64 element_data[1];
+ __u64 element_data[];
 } __packed;
 
 struct hv_24x7_result {
@@ -94,10 +129,16 @@ struct hv_24x7_result {
  __be16 result_element_data_size;
  __u8 reserved[0x2];
 
- /* WARNING: only valid for first result element due to variable sizes
- *          of result elements */
- /* struct hv_24x7_result_element[@num_elements_returned] */
- struct hv_24x7_result_element elements[1];
+ /*
+ * Either
+ * struct hv_24x7_result_element_v1[@num_elements_returned]
+ * or
+ * struct hv_24x7_result_element_v2[@num_elements_returned]
+ *
+ * depending on the interface_version field of the
+ * struct hv_24x7_data_result_buffer containing this result.
+ */
+ char elements[];
 } __packed;
 
 struct hv_24x7_data_result_buffer {
@@ -113,7 +154,7 @@ struct hv_24x7_data_result_buffer {
  __u8 reserved2[0x8];
  /* WARNING: only valid for the first result due to variable sizes of
  *    results */
- struct hv_24x7_result results[1]; /* [@num_results] */
+ struct hv_24x7_result results[]; /* [@num_results] */
 } __packed;
 
 #endif
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 913c54e23eea..3a6dfd14f64b 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -124,7 +124,7 @@ config HV_PERF_CTRS
   Enable access to hypervisor supplied counters in perf. Currently,
   this enables code that uses the hcall GetPerfCounterInfo and 24x7
   interfaces to retrieve counters. GPCI exists on Power 6 and later
-  systems. 24x7 is available on Power 8 systems.
+  systems. 24x7 is available on Power 8 and later systems.
 
           If unsure, select Y.
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 8/8] powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8

Thiago Jung Bauermann
In reply to this post by Thiago Jung Bauermann
On POWER9 SMT8 the 24x7 API returns two result elements for physical core
and virtual CPU events and we need to add their counts to get the final
result.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c | 58 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 95c44f1d2fd2..641f385e7eb0 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -31,6 +31,9 @@
 /* Version of the 24x7 hypervisor API that we should use in this machine. */
 static int interface_version;
 
+/* Whether we have to aggregate result data for some domains. */
+static bool aggregate_result_elements;
+
 static bool domain_is_valid(unsigned domain)
 {
  switch (domain) {
@@ -58,6 +61,15 @@ static bool is_physical_domain(unsigned domain)
  }
 }
 
+/* Domains for which more than one result element are returned for each event. */
+static bool domain_needs_aggregation(unsigned int domain)
+{
+ return aggregate_result_elements &&
+ (domain == HV_PERF_DOMAIN_PHYS_CORE ||
+ (domain >= HV_PERF_DOMAIN_VCPU_HOME_CORE &&
+  domain <= HV_PERF_DOMAIN_VCPU_REMOTE_NODE));
+}
+
 static const char *domain_name(unsigned domain)
 {
  if (!domain_is_valid(domain))
@@ -1149,17 +1161,23 @@ static int add_event_to_24x7_request(struct perf_event *event,
  req->starting_ix = cpu_to_be16(idx);
  req->max_ix = cpu_to_be16(1);
 
- if (request_buffer->interface_version > 1 &&
-    req->performance_domain != HV_PERF_DOMAIN_PHYS_CHIP) {
- req->starting_thread_group_ix = idx % 2;
- req->max_num_thread_groups = 1;
+ if (request_buffer->interface_version > 1) {
+ if (domain_needs_aggregation(req->performance_domain))
+ req->max_num_thread_groups = -1;
+ else if (req->performance_domain != HV_PERF_DOMAIN_PHYS_CHIP) {
+ req->starting_thread_group_ix = idx % 2;
+ req->max_num_thread_groups = 1;
+ }
  }
 
  return 0;
 }
 
 /**
- * get_count_from_result - get event count from the given result
+ * get_count_from_result - get event count from all result elements in result
+ *
+ * If the event corresponding to this result needs aggregation of the result
+ * element values, then this function does that.
  *
  * @event: Event associated with @res.
  * @resb: Result buffer containing @res.
@@ -1176,7 +1194,8 @@ static int get_count_from_result(struct perf_event *event,
  u16 data_size = be16_to_cpu(res->result_element_data_size);
  unsigned int data_offset;
  void *element_data;
- int ret = 0;
+ int i, ret = 0;
+ u64 count;
 
  /*
  * We can bail out early if the result is empty.
@@ -1192,9 +1211,11 @@ static int get_count_from_result(struct perf_event *event,
  }
 
  /*
- * This code assumes that a result has only one element.
+ * This code assumes that a result has only one element, except
+ * when an event needs aggregation.
  */
- if (num_elements != 1) {
+ if (num_elements != 1 &&
+    !domain_needs_aggregation(event_get_domain(event))) {
  pr_debug("Error: result of request %hhu has %hu elements\n",
  res->result_ix, num_elements);
 
@@ -1225,14 +1246,19 @@ static int get_count_from_result(struct perf_event *event,
  data_offset = offsetof(struct hv_24x7_result_element_v2,
        element_data);
 
- element_data = res->elements + data_offset;
+ /* Go through the result elements in the result. */
+ for (i = count = 0, element_data = res->elements + data_offset;
+     i < num_elements;
+     i++, element_data += data_size + data_offset)
+ if (!ret)
+ count += be64_to_cpu(*((u64 *) element_data));
 
  if (!ret)
- *countp = be64_to_cpu(*((u64 *) element_data));
+ *countp = count;
 
- /* The next result is after the result element. */
+ /* The next result is after the last result element. */
  if (next)
- *next = element_data + data_size;
+ *next = element_data - data_offset;
 
  return ret;
 }
@@ -1580,9 +1606,13 @@ static int hv_24x7_init(void)
  }
 
  /* POWER8 only supports v1, while POWER9 only supports v2. */
- if (cpu_has_feature(CPU_FTR_ARCH_300))
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
  interface_version = 2;
- else
+
+ /* SMT8 in POWER9 needs to aggregate result elements. */
+ if (threads_per_core == 8)
+ aggregate_result_elements = true;
+ } else
  interface_version = 1;
 
  hret = hv_perf_caps_get(&caps);
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

Sukadev Bhattiprolu-2
In reply to this post by Thiago Jung Bauermann
Thiago Jung Bauermann [[hidden email]] wrote:

> POWER9 introduces a new version of the hypervisor API to access the 24x7
> perf counters. The new version changed some of the structures used for
> requests and results.
>
> Signed-off-by: Thiago Jung Bauermann <[hidden email]>
> ---
>  arch/powerpc/perf/hv-24x7.c            | 145 +++++++++++++++++++++++++++------
>  arch/powerpc/perf/hv-24x7.h            |  59 ++++++++++++--
>  arch/powerpc/platforms/pseries/Kconfig |   2 +-
>  3 files changed, 173 insertions(+), 33 deletions(-)
>
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index 043cbc78be98..95c44f1d2fd2 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -18,6 +18,7 @@
>  #include <linux/slab.h>
>  #include <linux/vmalloc.h>
>
> +#include <asm/cputhreads.h>
>  #include <asm/firmware.h>
>  #include <asm/hvcall.h>
>  #include <asm/io.h>
> @@ -27,6 +28,9 @@
>  #include "hv-24x7-catalog.h"
>  #include "hv-common.h"
>
> +/* Version of the 24x7 hypervisor API that we should use in this machine. */
> +static int interface_version;
> +
>  static bool domain_is_valid(unsigned domain)
>  {
>   switch (domain) {
> @@ -74,7 +78,11 @@ static const char *domain_name(unsigned domain)
>
>  static bool catalog_entry_domain_is_valid(unsigned domain)
>  {
> - return is_physical_domain(domain);
> + /* POWER8 doesn't support virtual domains. */
> + if (interface_version == 1)
> + return is_physical_domain(domain);
> + else
> + return domain_is_valid(domain);
>  }
>
>  /*
> @@ -166,9 +174,12 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
>  DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>  DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>
> -#define MAX_NUM_REQUESTS ((H24x7_DATA_BUFFER_SIZE -       \
> +#define MAX_NUM_REQUESTS_V1 ((H24x7_DATA_BUFFER_SIZE -       \
> + sizeof(struct hv_24x7_request_buffer)) \
> + / H24x7_REQUEST_SIZE_V1)
> +#define MAX_NUM_REQUESTS_V2 ((H24x7_DATA_BUFFER_SIZE -       \
>   sizeof(struct hv_24x7_request_buffer)) \
> - / sizeof(struct hv_24x7_request))
> + / H24x7_REQUEST_SIZE_V2)

Nit: Can we define MAX_NUM_REQUESTS(version) - with a version parameter ? It
will...

>
>  static char *event_name(struct hv_24x7_event_data *ev, int *len)
>  {
> @@ -1052,7 +1063,7 @@ static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
>   memset(request_buffer, 0, H24x7_DATA_BUFFER_SIZE);
>   memset(result_buffer, 0, H24x7_DATA_BUFFER_SIZE);
>
> - request_buffer->interface_version = HV_24X7_IF_VERSION_CURRENT;
> + request_buffer->interface_version = interface_version;
>   /* memset above set request_buffer->num_requests to 0 */
>  }
>
> @@ -1077,7 +1088,7 @@ static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
>   if (ret) {
>   struct hv_24x7_request *req;
>
> - req = &request_buffer->requests[0];
> + req = request_buffer->requests;
>   pr_notice_ratelimited("hcall failed: [%d %#x %#x %d] => ret 0x%lx (%ld) detail=0x%x failing ix=%x\n",
>        req->performance_domain, req->data_offset,
>        req->starting_ix, req->starting_lpar_ix,
> @@ -1101,9 +1112,13 @@ static int add_event_to_24x7_request(struct perf_event *event,
>  {
>   u16 idx;
>   int i;
> + size_t req_size;
>   struct hv_24x7_request *req;
>
> - if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
> + if ((request_buffer->interface_version == 1
> +     && request_buffer->num_requests >= MAX_NUM_REQUESTS_V1)
> +    || (request_buffer->interface_version > 1
> + && request_buffer->num_requests >= MAX_NUM_REQUESTS_V2)) {
>   pr_devel("Too many requests for 24x7 HCALL %d\n",

...simplify this check to

        if (request->buffer->num_requests >= MAX_NUM_REQUESTS(version))

>   request_buffer->num_requests);
>   return -EINVAL;
> @@ -1120,8 +1135,11 @@ static int add_event_to_24x7_request(struct perf_event *event,
>   idx = event_get_vcpu(event);
>   }
>
> + req_size = request_buffer->interface_version == 1 ?
> +   H24x7_REQUEST_SIZE_V1 : H24x7_REQUEST_SIZE_V2;
> +

Maybe similarly, with H24x7_REQUEST_SIZE(version) ?

>   i = request_buffer->num_requests++;
> - req = &request_buffer->requests[i];
> + req = (void *) request_buffer->requests + i * req_size;
>
>   req->performance_domain = event_get_domain(event);
>   req->data_size = cpu_to_be16(8);
> @@ -1131,14 +1149,97 @@ static int add_event_to_24x7_request(struct perf_event *event,
>   req->starting_ix = cpu_to_be16(idx);
>   req->max_ix = cpu_to_be16(1);
>
> + if (request_buffer->interface_version > 1 &&
> +    req->performance_domain != HV_PERF_DOMAIN_PHYS_CHIP) {
> + req->starting_thread_group_ix = idx % 2;
> + req->max_num_thread_groups = 1;
> + }
> +
>   return 0;
>  }
>
> +/**
> + * get_count_from_result - get event count from the given result
> + *
> + * @event: Event associated with @res.
> + * @resb: Result buffer containing @res.
> + * @res: Result to work on.
> + * @countp: Output variable containing the event count.
> + * @next: Optional output variable pointing to the next result in @resb.
> + */
> +static int get_count_from_result(struct perf_event *event,
> + struct hv_24x7_data_result_buffer *resb,
> + struct hv_24x7_result *res, u64 *countp,
> + struct hv_24x7_result **next)
> +{
> + u16 num_elements = be16_to_cpu(res->num_elements_returned);
> + u16 data_size = be16_to_cpu(res->result_element_data_size);
> + unsigned int data_offset;
> + void *element_data;
> + int ret = 0;
> +
> + /*
> + * We can bail out early if the result is empty.
> + */
> + if (!num_elements) {
> + pr_debug("Result of request %hhu is empty, nothing to do\n",
> + res->result_ix);
> +
> + if (next)
> + *next = (struct hv_24x7_result *) res->elements;
> +
> + return -ENODATA;
> + }
> +
> + /*
> + * This code assumes that a result has only one element.
> + */
> + if (num_elements != 1) {
> + pr_debug("Error: result of request %hhu has %hu elements\n",
> + res->result_ix, num_elements);

Could this happen due to an user request or would this indicate a bug
in the way we submitted the request (perf should submit separate request
for each lpar/index - we set ->max_ix and ->max_num_lpars to cpu_be16(1).

Minor inconsistency with proceeding, is that if the next element passes,
this return code is lost/over written. IOW, h_24x7_event_commit_txn()'s
return value depends on the last element we process, even if intermediate
ones encounter an error.

> +
> + if (!next)
> + return -ENOTSUPP;
> +
> + /*
> + * We still need to go through the motions so that we can return
> + * a pointer to the next result.
> + */
> + ret = -ENOTSUPP;
> + }
> +
> + if (data_size != sizeof(u64)) {
> + pr_debug("Error: result of request %hhu has data of %hu bytes\n",
> + res->result_ix, data_size);
> +
> + if (!next)
> + return -ENOTSUPP;
> +
> + ret = -ENOTSUPP;
> + }
> +
> + if (resb->interface_version == 1)
> + data_offset = offsetof(struct hv_24x7_result_element_v1,
> +       element_data);
> + else
> + data_offset = offsetof(struct hv_24x7_result_element_v2,
> +       element_data);
> +
> + element_data = res->elements + data_offset;
> +
> + if (!ret)
> + *countp = be64_to_cpu(*((u64 *) element_data));
> +
> + /* The next result is after the result element. */
> + if (next)
> + *next = element_data + data_size;
> +
> + return ret;
> +}
> +
>  static int single_24x7_request(struct perf_event *event, u64 *count)
>  {
>   int ret;
> - u16 num_elements;
> - struct hv_24x7_result *result;
>   struct hv_24x7_request_buffer *request_buffer;
>   struct hv_24x7_data_result_buffer *result_buffer;
>
> @@ -1158,14 +1259,9 @@ static int single_24x7_request(struct perf_event *event, u64 *count)
>   if (ret)
>   goto out;
>
> - result = result_buffer->results;
> -
> - /* This code assumes that a result has only one element. */
> - num_elements = be16_to_cpu(result->num_elements_returned);
> - WARN_ON_ONCE(num_elements != 1);
> -
>   /* process result from hcall */
> - *count = be64_to_cpu(result->elements[0].element_data[0]);
> + ret = get_count_from_result(event, result_buffer,
> +    result_buffer->results, count, NULL);
>
>  out:
>   put_cpu_var(hv_24x7_reqb);
> @@ -1425,16 +1521,13 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
>   for (i = 0, res = result_buffer->results;
>       i < result_buffer->num_results; i++, res = next_res) {
>   struct perf_event *event = h24x7hw->events[res->result_ix];
> - u16 num_elements = be16_to_cpu(res->num_elements_returned);
> - u16 data_size = be16_to_cpu(res->result_element_data_size);
>
> - /* This code assumes that a result has only one element. */
> - WARN_ON_ONCE(num_elements != 1);
> + ret = get_count_from_result(event, result_buffer, res, &count,
> +    &next_res);
> + if (ret)
> + continue;
>
> - count = be64_to_cpu(res->elements[0].element_data[0]);
>   update_event_count(event, count);
> -
> - next_res = (void *) res->elements[0].element_data + data_size;
>   }
>
>   put_cpu_var(hv_24x7_hw);
> @@ -1486,6 +1579,12 @@ static int hv_24x7_init(void)
>   return -ENODEV;
>   }
>
> + /* POWER8 only supports v1, while POWER9 only supports v2. */
> + if (cpu_has_feature(CPU_FTR_ARCH_300))
> + interface_version = 2;
> + else
> + interface_version = 1;
> +
>   hret = hv_perf_caps_get(&caps);
>   if (hret) {
>   pr_debug("could not obtain capabilities, not enabling, rc=%ld\n",
> diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
> index b95909400b2a..149af6e9538f 100644
> --- a/arch/powerpc/perf/hv-24x7.h
> +++ b/arch/powerpc/perf/hv-24x7.h
> @@ -10,6 +10,9 @@ enum hv_perf_domains {
>   HV_PERF_DOMAIN_MAX,
>  };
>
> +#define H24x7_REQUEST_SIZE_V1 16
> +#define H24x7_REQUEST_SIZE_V2 32
> +
>  struct hv_24x7_request {
>   /* PHYSICAL domains require enabling via phyp/hmc. */
>   __u8 performance_domain;
> @@ -42,19 +45,47 @@ struct hv_24x7_request {
>   /* chip, core, or virtual processor based on @performance_domain */
>   __be16 starting_ix;
>   __be16 max_ix;
> +
> + /* The following fields were added in v2 of the 24x7 interface. */
> +
> + __u8 starting_thread_group_ix;
> +
> + /* -1 means all thread groups starting at @starting_thread_group_ix */
> + __u8 max_num_thread_groups;
> +
> + __u8 reserved2[0xE];
>  } __packed;
>
>  struct hv_24x7_request_buffer {
>   /* 0 - ? */
>   /* 1 - ? */
> -#define HV_24X7_IF_VERSION_CURRENT 0x01
>   __u8 interface_version;
>   __u8 num_requests;
>   __u8 reserved[0xE];
> - struct hv_24x7_request requests[1];
> + struct hv_24x7_request requests[];
> +} __packed;
> +
> +struct hv_24x7_result_element_v1 {
> + __be16 lpar_ix;
> +
> + /*
> + * represents the core, chip, or virtual processor based on the
> + * request's @performance_domain
> + */
> + __be16 domain_ix;
> +
> + /* -1 if @performance_domain does not refer to a virtual processor */
> + __be32 lpar_cfg_instance_id;
> +
> + /* size = @result_element_data_size of containing result. */
> + __u64 element_data[];
>  } __packed;
>
> -struct hv_24x7_result_element {
> +/*
> + * We need a separate struct for v2 because the offset of @element_data changed
> + * between versions.
> + */
> +struct hv_24x7_result_element_v2 {
>   __be16 lpar_ix;
>
>   /*
> @@ -66,8 +97,12 @@ struct hv_24x7_result_element {
>   /* -1 if @performance_domain does not refer to a virtual processor */
>   __be32 lpar_cfg_instance_id;
>
> + __u8 thread_group_ix;
> +
> + __u8 reserved[7];
> +
>   /* size = @result_element_data_size of containing result. */
> - __u64 element_data[1];
> + __u64 element_data[];
>  } __packed;
>
>  struct hv_24x7_result {
> @@ -94,10 +129,16 @@ struct hv_24x7_result {
>   __be16 result_element_data_size;
>   __u8 reserved[0x2];
>
> - /* WARNING: only valid for first result element due to variable sizes
> - *          of result elements */
> - /* struct hv_24x7_result_element[@num_elements_returned] */
> - struct hv_24x7_result_element elements[1];
> + /*
> + * Either
> + * struct hv_24x7_result_element_v1[@num_elements_returned]
> + * or
> + * struct hv_24x7_result_element_v2[@num_elements_returned]
> + *
> + * depending on the interface_version field of the
> + * struct hv_24x7_data_result_buffer containing this result.
> + */
> + char elements[];
>  } __packed;
>
>  struct hv_24x7_data_result_buffer {
> @@ -113,7 +154,7 @@ struct hv_24x7_data_result_buffer {
>   __u8 reserved2[0x8];
>   /* WARNING: only valid for the first result due to variable sizes of
>   *    results */
> - struct hv_24x7_result results[1]; /* [@num_results] */
> + struct hv_24x7_result results[]; /* [@num_results] */
>  } __packed;
>
>  #endif
> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
> index 913c54e23eea..3a6dfd14f64b 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
> @@ -124,7 +124,7 @@ config HV_PERF_CTRS
>    Enable access to hypervisor supplied counters in perf. Currently,
>    this enables code that uses the hcall GetPerfCounterInfo and 24x7
>    interfaces to retrieve counters. GPCI exists on Power 6 and later
> -  systems. 24x7 is available on Power 8 systems.
> +  systems. 24x7 is available on Power 8 and later systems.
>
>            If unsure, select Y.
>
> --
> 2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/8] Support for 24x7 hcall interface version 2

Sukadev Bhattiprolu-2
In reply to this post by Thiago Jung Bauermann
Thiago Jung Bauermann [[hidden email]] wrote:

> Hello,
>
> The hypervisor interface to access 24x7 performance counters (which collect
> performance information from system power on to system power off) has been
> extended in POWER9 adding new fields to the request and result element
> structures.
>
> Also, results for some domains now return more than one result element and
> those need to be added to get a total count.
>
> The first two patches fix bugs in the existing code. The following 4
> patches are code improvements and the last two finally implement support
> for the changes in POWER9 described above.
>
> POWER8 systems only support version 1 of the interface, while POWER9
> systems only support version 2. I tested these patches on POWER8 to verify
> that there are no regressions, and also on POWER9 DD1.
>
> Thiago Jung Bauermann (8):
>   powerpc/perf/hv-24x7: Fix passing of catalog version number
>   powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check
>   powerpc/perf/hv-24x7: Properly iterate through results
>   powerpc-perf/hx-24x7: Don't log failed hcall twice
>   powerpc/perf/hv-24x7: Fix return value of hcalls
>   powerpc/perf/hv-24x7: Minor improvements
>   powerpc/perf/hv-24x7: Support v2 of the hypervisor API
>   powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
>
>  arch/powerpc/perf/hv-24x7.c            | 255 +++++++++++++++++++++++++--------
>  arch/powerpc/perf/hv-24x7.h            |  70 +++++++--
>  arch/powerpc/platforms/pseries/Kconfig |   2 +-
>  3 files changed, 255 insertions(+), 72 deletions(-)

Reviewed-by: Sukadev Bhattiprolu <[hidden email]>
>
> --
> 2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

Thiago Jung Bauermann
In reply to this post by Sukadev Bhattiprolu-2

Hello Suka,

Thanks for your review!

Sukadev Bhattiprolu <[hidden email]> writes:

> Thiago Jung Bauermann [[hidden email]] wrote:
>> @@ -166,9 +174,12 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
>>  DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>>  DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>>
>> -#define MAX_NUM_REQUESTS ((H24x7_DATA_BUFFER_SIZE -       \
>> +#define MAX_NUM_REQUESTS_V1 ((H24x7_DATA_BUFFER_SIZE -       \
>> + sizeof(struct hv_24x7_request_buffer)) \
>> + / H24x7_REQUEST_SIZE_V1)
>> +#define MAX_NUM_REQUESTS_V2 ((H24x7_DATA_BUFFER_SIZE -       \
>>   sizeof(struct hv_24x7_request_buffer)) \
>> - / sizeof(struct hv_24x7_request))
>> + / H24x7_REQUEST_SIZE_V2)
>
> Nit: Can we define MAX_NUM_REQUESTS(version) - with a version parameter ? It
> will...

That's a good idea. I created a function instead of a macro, I think it
makes the code clearer since the macro would be a bit harder to read.

>> @@ -1101,9 +1112,13 @@ static int add_event_to_24x7_request(struct perf_event *event,
>>  {
>>   u16 idx;
>>   int i;
>> + size_t req_size;
>>   struct hv_24x7_request *req;
>>
>> - if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
>> + if ((request_buffer->interface_version == 1
>> +     && request_buffer->num_requests >= MAX_NUM_REQUESTS_V1)
>> +    || (request_buffer->interface_version > 1
>> + && request_buffer->num_requests >= MAX_NUM_REQUESTS_V2)) {
>>   pr_devel("Too many requests for 24x7 HCALL %d\n",
>
> ...simplify this check to
>
> if (request->buffer->num_requests >= MAX_NUM_REQUESTS(version))

That's better indeed.

>>   request_buffer->num_requests);
>>   return -EINVAL;
>> @@ -1120,8 +1135,11 @@ static int add_event_to_24x7_request(struct perf_event *event,
>>   idx = event_get_vcpu(event);
>>   }
>>
>> + req_size = request_buffer->interface_version == 1 ?
>> +   H24x7_REQUEST_SIZE_V1 : H24x7_REQUEST_SIZE_V2;
>> +
>
> Maybe similarly, with H24x7_REQUEST_SIZE(version) ?

I implement this suggestion too.

>> +/**
>> + * get_count_from_result - get event count from the given result
>> + *
>> + * @event: Event associated with @res.
>> + * @resb: Result buffer containing @res.
>> + * @res: Result to work on.
>> + * @countp: Output variable containing the event count.
>> + * @next: Optional output variable pointing to the next result in @resb.
>> + */
>> +static int get_count_from_result(struct perf_event *event,
>> + struct hv_24x7_data_result_buffer *resb,
>> + struct hv_24x7_result *res, u64 *countp,
>> + struct hv_24x7_result **next)
>> +{
>> + u16 num_elements = be16_to_cpu(res->num_elements_returned);
>> + u16 data_size = be16_to_cpu(res->result_element_data_size);
>> + unsigned int data_offset;
>> + void *element_data;
>> + int ret = 0;
>> +
>> + /*
>> + * We can bail out early if the result is empty.
>> + */
>> + if (!num_elements) {
>> + pr_debug("Result of request %hhu is empty, nothing to do\n",
>> + res->result_ix);
>> +
>> + if (next)
>> + *next = (struct hv_24x7_result *) res->elements;
>> +
>> + return -ENODATA;
>> + }
>> +
>> + /*
>> + * This code assumes that a result has only one element.
>> + */
>> + if (num_elements != 1) {
>> + pr_debug("Error: result of request %hhu has %hu elements\n",
>> + res->result_ix, num_elements);
>
> Could this happen due to an user request or would this indicate a bug
> in the way we submitted the request (perf should submit separate request
> for each lpar/index - we set ->max_ix and ->max_num_lpars to cpu_be16(1).

That's a good question. I don't know to be honest.

> Minor inconsistency with proceeding, is that if the next element passes,
> this return code is lost/over written. IOW, h_24x7_event_commit_txn()'s
> return value depends on the last element we process, even if intermediate
> ones encounter an error.

Good point. Would it be better if h_24x7_event_commit_txn interrupted
processing and returned an error instead of continuing?

The version below addresses your comments, except the one above.

Subject: [PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.

Signed-off-by: Thiago Jung Bauermann <[hidden email]>
---
 arch/powerpc/perf/hv-24x7.c            | 143 +++++++++++++++++++++++++++------
 arch/powerpc/perf/hv-24x7.h            |  58 ++++++++++---
 arch/powerpc/platforms/pseries/Kconfig |   2 +-
 3 files changed, 169 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 043cbc78be98..35010403f191 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -18,6 +18,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 
+#include <asm/cputhreads.h>
 #include <asm/firmware.h>
 #include <asm/hvcall.h>
 #include <asm/io.h>
@@ -27,6 +28,9 @@
 #include "hv-24x7-catalog.h"
 #include "hv-common.h"
 
+/* Version of the 24x7 hypervisor API that we should use in this machine. */
+static int interface_version;
+
 static bool domain_is_valid(unsigned domain)
 {
  switch (domain) {
@@ -74,7 +78,11 @@ static const char *domain_name(unsigned domain)
 
 static bool catalog_entry_domain_is_valid(unsigned domain)
 {
- return is_physical_domain(domain);
+ /* POWER8 doesn't support virtual domains. */
+ if (interface_version == 1)
+ return is_physical_domain(domain);
+ else
+ return domain_is_valid(domain);
 }
 
 /*
@@ -166,9 +174,11 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
 DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 
-#define MAX_NUM_REQUESTS ((H24x7_DATA_BUFFER_SIZE -       \
- sizeof(struct hv_24x7_request_buffer)) \
- / sizeof(struct hv_24x7_request))
+static unsigned int max_num_requests(int interface_version)
+{
+ return (H24x7_DATA_BUFFER_SIZE - sizeof(struct hv_24x7_request_buffer))
+ / H24x7_REQUEST_SIZE(interface_version);
+}
 
 static char *event_name(struct hv_24x7_event_data *ev, int *len)
 {
@@ -1052,7 +1062,7 @@ static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
  memset(request_buffer, 0, H24x7_DATA_BUFFER_SIZE);
  memset(result_buffer, 0, H24x7_DATA_BUFFER_SIZE);
 
- request_buffer->interface_version = HV_24X7_IF_VERSION_CURRENT;
+ request_buffer->interface_version = interface_version;
  /* memset above set request_buffer->num_requests to 0 */
 }
 
@@ -1077,7 +1087,7 @@ static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
  if (ret) {
  struct hv_24x7_request *req;
 
- req = &request_buffer->requests[0];
+ req = request_buffer->requests;
  pr_notice_ratelimited("hcall failed: [%d %#x %#x %d] => ret 0x%lx (%ld) detail=0x%x failing ix=%x\n",
       req->performance_domain, req->data_offset,
       req->starting_ix, req->starting_lpar_ix,
@@ -1101,9 +1111,11 @@ static int add_event_to_24x7_request(struct perf_event *event,
 {
  u16 idx;
  int i;
+ size_t req_size;
  struct hv_24x7_request *req;
 
- if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
+ if (request_buffer->num_requests >=
+    max_num_requests(request_buffer->interface_version)) {
  pr_devel("Too many requests for 24x7 HCALL %d\n",
  request_buffer->num_requests);
  return -EINVAL;
@@ -1120,8 +1132,10 @@ static int add_event_to_24x7_request(struct perf_event *event,
  idx = event_get_vcpu(event);
  }
 
+ req_size = H24x7_REQUEST_SIZE(request_buffer->interface_version);
+
  i = request_buffer->num_requests++;
- req = &request_buffer->requests[i];
+ req = (void *) request_buffer->requests + i * req_size;
 
  req->performance_domain = event_get_domain(event);
  req->data_size = cpu_to_be16(8);
@@ -1131,14 +1145,97 @@ static int add_event_to_24x7_request(struct perf_event *event,
  req->starting_ix = cpu_to_be16(idx);
  req->max_ix = cpu_to_be16(1);
 
+ if (request_buffer->interface_version > 1 &&
+    req->performance_domain != HV_PERF_DOMAIN_PHYS_CHIP) {
+ req->starting_thread_group_ix = idx % 2;
+ req->max_num_thread_groups = 1;
+ }
+
  return 0;
 }
 
+/**
+ * get_count_from_result - get event count from the given result
+ *
+ * @event: Event associated with @res.
+ * @resb: Result buffer containing @res.
+ * @res: Result to work on.
+ * @countp: Output variable containing the event count.
+ * @next: Optional output variable pointing to the next result in @resb.
+ */
+static int get_count_from_result(struct perf_event *event,
+ struct hv_24x7_data_result_buffer *resb,
+ struct hv_24x7_result *res, u64 *countp,
+ struct hv_24x7_result **next)
+{
+ u16 num_elements = be16_to_cpu(res->num_elements_returned);
+ u16 data_size = be16_to_cpu(res->result_element_data_size);
+ unsigned int data_offset;
+ void *element_data;
+ int ret = 0;
+
+ /*
+ * We can bail out early if the result is empty.
+ */
+ if (!num_elements) {
+ pr_debug("Result of request %hhu is empty, nothing to do\n",
+ res->result_ix);
+
+ if (next)
+ *next = (struct hv_24x7_result *) res->elements;
+
+ return -ENODATA;
+ }
+
+ /*
+ * This code assumes that a result has only one element.
+ */
+ if (num_elements != 1) {
+ pr_debug("Error: result of request %hhu has %hu elements\n",
+ res->result_ix, num_elements);
+
+ if (!next)
+ return -ENOTSUPP;
+
+ /*
+ * We still need to go through the motions so that we can return
+ * a pointer to the next result.
+ */
+ ret = -ENOTSUPP;
+ }
+
+ if (data_size != sizeof(u64)) {
+ pr_debug("Error: result of request %hhu has data of %hu bytes\n",
+ res->result_ix, data_size);
+
+ if (!next)
+ return -ENOTSUPP;
+
+ ret = -ENOTSUPP;
+ }
+
+ if (resb->interface_version == 1)
+ data_offset = offsetof(struct hv_24x7_result_element_v1,
+       element_data);
+ else
+ data_offset = offsetof(struct hv_24x7_result_element_v2,
+       element_data);
+
+ element_data = res->elements + data_offset;
+
+ if (!ret)
+ *countp = be64_to_cpu(*((u64 *) element_data));
+
+ /* The next result is after the result element. */
+ if (next)
+ *next = element_data + data_size;
+
+ return ret;
+}
+
 static int single_24x7_request(struct perf_event *event, u64 *count)
 {
  int ret;
- u16 num_elements;
- struct hv_24x7_result *result;
  struct hv_24x7_request_buffer *request_buffer;
  struct hv_24x7_data_result_buffer *result_buffer;
 
@@ -1158,14 +1255,9 @@ static int single_24x7_request(struct perf_event *event, u64 *count)
  if (ret)
  goto out;
 
- result = result_buffer->results;
-
- /* This code assumes that a result has only one element. */
- num_elements = be16_to_cpu(result->num_elements_returned);
- WARN_ON_ONCE(num_elements != 1);
-
  /* process result from hcall */
- *count = be64_to_cpu(result->elements[0].element_data[0]);
+ ret = get_count_from_result(event, result_buffer,
+    result_buffer->results, count, NULL);
 
 out:
  put_cpu_var(hv_24x7_reqb);
@@ -1425,16 +1517,13 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
  for (i = 0, res = result_buffer->results;
      i < result_buffer->num_results; i++, res = next_res) {
  struct perf_event *event = h24x7hw->events[res->result_ix];
- u16 num_elements = be16_to_cpu(res->num_elements_returned);
- u16 data_size = be16_to_cpu(res->result_element_data_size);
 
- /* This code assumes that a result has only one element. */
- WARN_ON_ONCE(num_elements != 1);
+ ret = get_count_from_result(event, result_buffer, res, &count,
+    &next_res);
+ if (ret)
+ continue;
 
- count = be64_to_cpu(res->elements[0].element_data[0]);
  update_event_count(event, count);
-
- next_res = (void *) res->elements[0].element_data + data_size;
  }
 
  put_cpu_var(hv_24x7_hw);
@@ -1486,6 +1575,12 @@ static int hv_24x7_init(void)
  return -ENODEV;
  }
 
+ /* POWER8 only supports v1, while POWER9 only supports v2. */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ interface_version = 2;
+ else
+ interface_version = 1;
+
  hret = hv_perf_caps_get(&caps);
  if (hret) {
  pr_debug("could not obtain capabilities, not enabling, rc=%ld\n",
diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
index b95909400b2a..5092c4a222a6 100644
--- a/arch/powerpc/perf/hv-24x7.h
+++ b/arch/powerpc/perf/hv-24x7.h
@@ -10,6 +10,8 @@ enum hv_perf_domains {
  HV_PERF_DOMAIN_MAX,
 };
 
+#define H24x7_REQUEST_SIZE(iface_version) (iface_version == 1 ? 16 : 32)
+
 struct hv_24x7_request {
  /* PHYSICAL domains require enabling via phyp/hmc. */
  __u8 performance_domain;
@@ -42,19 +44,47 @@ struct hv_24x7_request {
  /* chip, core, or virtual processor based on @performance_domain */
  __be16 starting_ix;
  __be16 max_ix;
+
+ /* The following fields were added in v2 of the 24x7 interface. */
+
+ __u8 starting_thread_group_ix;
+
+ /* -1 means all thread groups starting at @starting_thread_group_ix */
+ __u8 max_num_thread_groups;
+
+ __u8 reserved2[0xE];
 } __packed;
 
 struct hv_24x7_request_buffer {
  /* 0 - ? */
  /* 1 - ? */
-#define HV_24X7_IF_VERSION_CURRENT 0x01
  __u8 interface_version;
  __u8 num_requests;
  __u8 reserved[0xE];
- struct hv_24x7_request requests[1];
+ struct hv_24x7_request requests[];
+} __packed;
+
+struct hv_24x7_result_element_v1 {
+ __be16 lpar_ix;
+
+ /*
+ * represents the core, chip, or virtual processor based on the
+ * request's @performance_domain
+ */
+ __be16 domain_ix;
+
+ /* -1 if @performance_domain does not refer to a virtual processor */
+ __be32 lpar_cfg_instance_id;
+
+ /* size = @result_element_data_size of containing result. */
+ __u64 element_data[];
 } __packed;
 
-struct hv_24x7_result_element {
+/*
+ * We need a separate struct for v2 because the offset of @element_data changed
+ * between versions.
+ */
+struct hv_24x7_result_element_v2 {
  __be16 lpar_ix;
 
  /*
@@ -66,8 +96,12 @@ struct hv_24x7_result_element {
  /* -1 if @performance_domain does not refer to a virtual processor */
  __be32 lpar_cfg_instance_id;
 
+ __u8 thread_group_ix;
+
+ __u8 reserved[7];
+
  /* size = @result_element_data_size of containing result. */
- __u64 element_data[1];
+ __u64 element_data[];
 } __packed;
 
 struct hv_24x7_result {
@@ -94,10 +128,16 @@ struct hv_24x7_result {
  __be16 result_element_data_size;
  __u8 reserved[0x2];
 
- /* WARNING: only valid for first result element due to variable sizes
- *          of result elements */
- /* struct hv_24x7_result_element[@num_elements_returned] */
- struct hv_24x7_result_element elements[1];
+ /*
+ * Either
+ * struct hv_24x7_result_element_v1[@num_elements_returned]
+ * or
+ * struct hv_24x7_result_element_v2[@num_elements_returned]
+ *
+ * depending on the interface_version field of the
+ * struct hv_24x7_data_result_buffer containing this result.
+ */
+ char elements[];
 } __packed;
 
 struct hv_24x7_data_result_buffer {
@@ -113,7 +153,7 @@ struct hv_24x7_data_result_buffer {
  __u8 reserved2[0x8];
  /* WARNING: only valid for the first result due to variable sizes of
  *    results */
- struct hv_24x7_result results[1]; /* [@num_results] */
+ struct hv_24x7_result results[]; /* [@num_results] */
 } __packed;
 
 #endif
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 913c54e23eea..3a6dfd14f64b 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -124,7 +124,7 @@ config HV_PERF_CTRS
   Enable access to hypervisor supplied counters in perf. Currently,
   this enables code that uses the hcall GetPerfCounterInfo and 24x7
   interfaces to retrieve counters. GPCI exists on Power 6 and later
-  systems. 24x7 is available on Power 8 systems.
+  systems. 24x7 is available on Power 8 and later systems.
 
           If unsure, select Y.
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/8] Support for 24x7 hcall interface version 2

Michael Ellerman-2
In reply to this post by Thiago Jung Bauermann
Thiago Jung Bauermann <[hidden email]> writes:

> Hello,
>
> The hypervisor interface to access 24x7 performance counters (which collect
> performance information from system power on to system power off) has been
> extended in POWER9 adding new fields to the request and result element
> structures.
>
> Also, results for some domains now return more than one result element and
> those need to be added to get a total count.
>
> The first two patches fix bugs in the existing code. The following 4
> patches are code improvements and the last two finally implement support
> for the changes in POWER9 described above.
>
> POWER8 systems only support version 1 of the interface, while POWER9
> systems only support version 2. I tested these patches on POWER8 to verify
> that there are no regressions, and also on POWER9 DD1.

Where is version 2 documented?

And what happens when we boot on a POWER9 in POWER8 compatibility mode?

cheers
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/8] Support for 24x7 hcall interface version 2

Thiago Jung Bauermann

Michael Ellerman <[hidden email]> writes:

> Thiago Jung Bauermann <[hidden email]> writes:
>> The hypervisor interface to access 24x7 performance counters (which collect
>> performance information from system power on to system power off) has been
>> extended in POWER9 adding new fields to the request and result element
>> structures.
>>
>> Also, results for some domains now return more than one result element and
>> those need to be added to get a total count.
>>
>> The first two patches fix bugs in the existing code. The following 4
>> patches are code improvements and the last two finally implement support
>> for the changes in POWER9 described above.
>>
>> POWER8 systems only support version 1 of the interface, while POWER9
>> systems only support version 2. I tested these patches on POWER8 to verify
>> that there are no regressions, and also on POWER9 DD1.
>
> Where is version 2 documented?

Only in an internal specification.

> And what happens when we boot on a POWER9 in POWER8 compatibility mode?

It will still use version 2 of the API, and still require aggregation of
result elements. Does this mean that hv_24x7_init should check
cur_cpu_spec->oprofile_cpu_type for "ppc64/power9" instead of
cpu_has_feature(CPU_FTR_ARCH_300)?

There were a couple of comments from Suka which still needed resolving:

Sukadev Bhattiprolu <[hidden email]> writes:

> Thiago Jung Bauermann [[hidden email]] wrote:
>> +/**
>> + * get_count_from_result - get event count from the given result
>> + *
>> + * @event: Event associated with @res.
>> + * @resb: Result buffer containing @res.
>> + * @res: Result to work on.
>> + * @countp: Output variable containing the event count.
>> + * @next: Optional output variable pointing to the next result in @resb.
>> + */
>> +static int get_count_from_result(struct perf_event *event,
>> + struct hv_24x7_data_result_buffer *resb,
>> + struct hv_24x7_result *res, u64 *countp,
>> + struct hv_24x7_result **next)
>> +{
>> + u16 num_elements = be16_to_cpu(res->num_elements_returned);
>> + u16 data_size = be16_to_cpu(res->result_element_data_size);
>> + unsigned int data_offset;
>> + void *element_data;
>> + int ret = 0;
>> +
>> + /*
>> + * We can bail out early if the result is empty.
>> + */
>> + if (!num_elements) {
>> + pr_debug("Result of request %hhu is empty, nothing to do\n",
>> + res->result_ix);
>> +
>> + if (next)
>> + *next = (struct hv_24x7_result *) res->elements;
>> +
>> + return -ENODATA;
>> + }
>> +
>> + /*
>> + * This code assumes that a result has only one element.
>> + */
>> + if (num_elements != 1) {
>> + pr_debug("Error: result of request %hhu has %hu elements\n",
>> + res->result_ix, num_elements);
>
> Could this happen due to an user request or would this indicate a bug
> in the way we submitted the request (perf should submit separate request
> for each lpar/index - we set ->max_ix and ->max_num_lpars to cpu_be16(1).

By specifying 1 as the maximum for the smallest resource we're
requesting, we guarantee one element per result. Since that's what we
do this would indeed be a bug. For v2 (which I'll send shortly) I
changed the message above to a pr_err, and changed the returned error to
-EIO instead of -ENOTSUPP.

> Minor inconsistency with proceeding, is that if the next element passes,
> this return code is lost/over written. IOW, h_24x7_event_commit_txn()'s
> return value depends on the last element we process, even if intermediate
> ones encounter an error.

For v2, I changed h_24x7_event_commit_txn to break out of the loop if
there's an error in any processed result. Also, since in that case @next
isn't used anymore, get_count_from_result will exit early instead of
going through the motions.

>> +
>> + if (!next)
>> + return -ENOTSUPP;
>> +
>> + /*
>> + * We still need to go through the motions so that we can return
>> + * a pointer to the next result.
>> + */
>> + ret = -ENOTSUPP;
>> + }
>> +
>> + if (data_size != sizeof(u64)) {
>> + pr_debug("Error: result of request %hhu has data of %hu bytes\n",
>> + res->result_ix, data_size);
>> +
>> + if (!next)
>> + return -ENOTSUPP;
>> +
>> + ret = -ENOTSUPP;
>> + }
>> +
>> + if (resb->interface_version == 1)
>> + data_offset = offsetof(struct hv_24x7_result_element_v1,
>> +       element_data);
>> + else
>> + data_offset = offsetof(struct hv_24x7_result_element_v2,
>> +       element_data);
>> +
>> + element_data = res->elements + data_offset;
>> +
>> + if (!ret)
>> + *countp = be64_to_cpu(*((u64 *) element_data));
>> +
>> + /* The next result is after the result element. */
>> + if (next)
>> + *next = element_data + data_size;
>> +
>> + return ret;
>> +}
>> +
>>  static int single_24x7_request(struct perf_event *event, u64 *count)
>>  {
>>   int ret;
>> - u16 num_elements;
>> - struct hv_24x7_result *result;
>>   struct hv_24x7_request_buffer *request_buffer;
>>   struct hv_24x7_data_result_buffer *result_buffer;
>>
>> @@ -1158,14 +1259,9 @@ static int single_24x7_request(struct perf_event *event, u64 *count)
>>   if (ret)
>>   goto out;
>>
>> - result = result_buffer->results;
>> -
>> - /* This code assumes that a result has only one element. */
>> - num_elements = be16_to_cpu(result->num_elements_returned);
>> - WARN_ON_ONCE(num_elements != 1);
>> -
>>   /* process result from hcall */
>> - *count = be64_to_cpu(result->elements[0].element_data[0]);
>> + ret = get_count_from_result(event, result_buffer,
>> +    result_buffer->results, count, NULL);
>>
>>  out:
>>   put_cpu_var(hv_24x7_reqb);
>> @@ -1425,16 +1521,13 @@ static int h_24x7_event_commit_txn(struct pmu *pmu)
>>   for (i = 0, res = result_buffer->results;
>>       i < result_buffer->num_results; i++, res = next_res) {
>>   struct perf_event *event = h24x7hw->events[res->result_ix];
>> - u16 num_elements = be16_to_cpu(res->num_elements_returned);
>> - u16 data_size = be16_to_cpu(res->result_element_data_size);
>>
>> - /* This code assumes that a result has only one element. */
>> - WARN_ON_ONCE(num_elements != 1);
>> + ret = get_count_from_result(event, result_buffer, res, &count,
>> +    &next_res);
>> + if (ret)
>> + continue;
>>
>> - count = be64_to_cpu(res->elements[0].element_data[0]);
>>   update_event_count(event, count);
>> -
>> - next_res = (void *) res->elements[0].element_data + data_size;
>>   }
>>
>>   put_cpu_var(hv_24x7_hw);

--
Thiago Jung Bauermann
IBM Linux Technology Center

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/8] Support for 24x7 hcall interface version 2

Michael Ellerman-2
Thiago Jung Bauermann <[hidden email]> writes:

> Michael Ellerman <[hidden email]> writes:
>> Thiago Jung Bauermann <[hidden email]> writes:
>>> The hypervisor interface to access 24x7 performance counters (which collect
>>> performance information from system power on to system power off) has been
>>> extended in POWER9 adding new fields to the request and result element
>>> structures.
>>>
>>> Also, results for some domains now return more than one result element and
>>> those need to be added to get a total count.
>>>
>>> The first two patches fix bugs in the existing code. The following 4
>>> patches are code improvements and the last two finally implement support
>>> for the changes in POWER9 described above.
>>>
>>> POWER8 systems only support version 1 of the interface, while POWER9
>>> systems only support version 2. I tested these patches on POWER8 to verify
>>> that there are no regressions, and also on POWER9 DD1.
>>
>> Where is version 2 documented?
>
> Only in an internal specification.

Presumably a PAPR ACR, what's the title? And is it approved/accepted?

>> And what happens when we boot on a POWER9 in POWER8 compatibility mode?
>
> It will still use version 2 of the API, and still require aggregation of
> result elements. Does this mean that hv_24x7_init should check
> cur_cpu_spec->oprofile_cpu_type for "ppc64/power9" instead of
> cpu_has_feature(CPU_FTR_ARCH_300)?

Yes.

cheers