Saturday, September 6, 2025

Databricks-first way to do row-level security PowerBI


1) Security mapping table (who can see what)

-- Use your real catalog/schema names
CREATE SCHEMA IF NOT EXISTS prod_catalog.security;

CREATE TABLE IF NOT EXISTS prod_catalog.security.user_security_gl (
  user_email STRING COMMENT 'AAD/Okta sign-in email shown to Databricks',
  co         STRING COMMENT 'Company code the user may see (use * for all)',
  -- add more dims if needed, e.g. state STRING, product STRING
  inserted_at TIMESTAMP DEFAULT current_timestamp()
)
COMMENT 'Row-access map for gl_summary';

Example rows (replace with yours):

INSERT INTO prod_catalog.security.user_security_gl (user_email, co) VALUES
  ('alice@yourorg.com','00001'),
  ('alice@yourorg.com','00002'),
  ('bob@yourorg.com','00003'),
  ('admin@yourorg.com','*');  -- * means “see everything”

2) Dynamic RLS view using the current user’s email

-- Base table example: prod_catalog.eim_gl_finance_reporting_db.gl_summary
-- Filter joins on CO; extend to more columns if needed.
CREATE OR REPLACE VIEW prod_catalog.eim_gl_finance_reporting_db.vw_gl_summary_rls AS
SELECT m.*
FROM xxxxxx.tbl_summary m
WHERE EXISTS (
  SELECT 1
  FROM security.user_security u
  WHERE lower(u.user_email) = lower(current_user())  -- the signed-in user
    AND (u.co = m.co OR u.co = '*')
);

Notes:

  • current_user() returns the principal seen by Databricks (typically the AAD email/UPN).

  • EXISTS avoids duplicate rows even if a user has multiple allowed values.

3) Lock down privileges: expose only the view

-- Make sure consumers can’t query the base table directly.
REVOKE SELECT ON TABLE prod_catalog.eim_gl_finance_reporting_db.gl_summary FROM `users`;

-- Grant access to the secure view (replace group with yours, e.g. power_bi_readers)
GRANT USAGE ON CATALOG prod_catalog TO `power_bi_readers`;
GRANT USAGE ON SCHEMA prod_catalog.eim_gl_finance_reporting_db TO `power_bi_readers`;
GRANT SELECT ON VIEW prod_catalog.eim_gl_finance_reporting_db.vw_summary_rls TO `power_bi_readers`;

Quick checks (optional):

-- Who am I (as Databricks sees me)?
SELECT current_user();

-- What rows should I be allowed?
SELECT * FROM security.user_security
WHERE lower(user_email)=lower(current_user());

4) Power BI wiring (so it auto-filters)

  1. In Power BI Desktop: Get Data ➜ Databricks connector.

  2. Authentication: Sign in with your Microsoft/Azure AD account (OAuth).

  3. Storage mode: choose DirectQuery (required for server-side RLS to apply at query time).

  4. Select xxxxx.vw_tbl_summary (use the view, not the base table).

  5. Publish to Power BI Service. In the dataset Settings → Data source credentials, ensure OAuth2 and enable “Use SSO via Azure AD for DirectQuery queries.”

  6. Share the report—viewers will only see rows allowed by their email mapping.

5) Variations you may need

  • Multiple dimensions: add columns (e.g., state, product) to user_security_gl and extend the EXISTS predicate with more OR checks (or add multiple EXISTS for clarity).

  • All access: keep the '*' wildcard (or manage an is_admin flag).

  • Performance: if the base table is large and co is your primary filter, consider clustering/z-ordering on co.

This setup keeps RLS entirely in Databricks. As long as Power BI uses AAD SSO + DirectQuery and points to the view, filtering happens automatically per user email.