Skip to content

[DRAF PR] Add a test case for fix of DB-10138 #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ashetkar
Copy link
Owner

@ashetkar ashetkar commented Mar 7, 2025

ashetkar pushed a commit that referenced this pull request Apr 11, 2025
Summary:
This revision adds the procedure `yb_index_check()` that checks whether the given index is consistent with its base relation or not. This operation doesn't support GIN indexes yet.

#### Context
An index row has the following components:
- index attributes: key and non-key columns (if any)
- ybbasectid: system attribute storing the ybctid of base relation row
- ybuniqueidxkeysuffix: system attribute, present only for unique indexes (it is non-null only when the index is in null-are-distinct mode and the key columns contain at least one NULL)

The structure of an index row is as follows (in `<DocKey>` -> `<DocValue>` format):
- Non-unique index: (key cols, ybbasectid) -> (non-key cols)
- Unique index: (key cols, ybuniqueidxkeysuffix) -> (non-key cols, ybbasectid)

####Consistency Definition
An index row is consistent if all of its attributes are consistent. An index attribute is consistent if its value and the corresponding base relation value are
- for key attributes: binary equal (semantic equality without binary equality runs the risk of allowing multiple index rows for a given base table row if the key column can have multiple binary representations).
- for non-key attributes: binary or semantically equal.

Note: if both the values are NULL, they are consistent.

####Consistency Check
Index consistency check is done in two steps:
  # Check for spurious index rows
  # Check for missing index rows

**Part 1: Check for spurious index rows**
Here, we check if the index contains a row that it should not. To do this:

For every index row, fetch the row in the base table (filtered by partial index predicate) such that baserow.ybctid == indexrow.ybbasectid. If such a row doesn’t exist, use a baserow with all NULL values. The result will be the same as a LEFT join on indexrow.ybbasectid = baserow.ybctid with the index table on the left (if the index was a regular relation).

Fetch the following columns as the join targetlist:
- from index row: ybbasectid, index attributes, ybuniqueidxkeysuffix (only for unique indexes)
- from base table row: ybctid, columns/expressions corresponding to index attributes

On the joined result, make the following checks:

  # ybbasectid should be non-null
  # ybbasectid should be equal to ybctid
  # index attributes and the corresponding base relation column/expression should be consistent as per the above definition
  # for unique indexes, ybuniqueidxkeysuffix should be non-null iff the index uses null-are-distinct mode and key columns contain at least one null. When non-null, it should be equal to ybbasectid

If the above checks pass for every row in the index, it implies that the index does not contain any spurious rows. This can be proved by contradiction as follows:

Let’s assume that the above checks passed for every row in the index, yet it contains a spurious row, namely indexrow1. This index row must satisfy the following:
- indexrow1.ybbasectid != null (from check #1)
- base table has a row, namely baserow, such that baserow.ybctid == indexrow1.ybbasectid (otherwise ybctid would be null and check yugabyte#2 would have failed)
- index attributes of indexrow1 are consistent with baserow (from check yugabyte#3)
- If the index is unique, indexrow1.ybuniqueidxkeysuffix is either null or equal to ybbasectid, depending on the index mode and key cols (from check yugabyte#4)

The above shows that indexrow1 has a valid counterpart in the baserow. Given this, the only possible reason why indexrow1 should not have been present in the index is that another index row, namely indexrow2, must exist such that the pair (indexrow2, baserow) also satisfies the above checks.

We can say that indexrow1 and indexrow2
- have the same ybbasectid (baserow.ybctid == indexrow2.ybbasectid == indexrow1.ybbasectid).
- have binary equal values for key columns. This is because key cols of both index rows are binary equal to the corresponding baserow values (from check yugabyte#3 and definition of consistency).
- have identical ybuniqueidxkeysuffix (it depends on index type, mode, and key cols - all of these are already established to be the same for the two index rows).

The DocKey of the index row is created by a subset of (key cols, ybbasectid, ybuniqueidxkeysuffix). Each component is identical for the two index rows, implying identical DocKeys. This is not possible because DocDB does not allow duplicate DocKeys. Hence, such an indexrow1 does not exist.

**Part 2: Check for missing index rows**
This part checks if no entries are missing from the index. Given that it is already established that the index does not contain any spurious rows, it suffices to check if the index row count is what it should be. That is, for every qualifying row in the base table (filtered by partial index predicate), the index should contain one row.

####Implementation
- To fetch the index row and the corresponding base table tow efficiently, batch nested loop join is used (details below)
- Both parts of the check use a single read time. This works out of the box because the entire check is executed as a single YSQL statement.

**Batch Nested Loop Join usage**
Batchable join clauses must be of the form `inner_indexed_var = expression on (outer_vars)` and the expression must not involve functions.

To satisfy the above requirement,
- join condition: baserow.ybctid == indexrow.ybbasectid.
- outer subplan: index relation scan
- inner subplan: base relation scan. BNL expects an index on the var referenced in the join clause (ybctid, in this case). So, a dummy primary key index object on the ybctid column is temporarily created (not persisted in the PG catalog). Like any other PK index in YB, this index points to the base relation and doesn't have a separate docdb table. Because such an index object doesn't actually exist, the planner was bypassed and the join plan was hardcoded.
Jira: DB-15118

Test Plan:
   ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressYbIndexCheck

Reviewers: amartsinchyk, tnayak

Reviewed By: amartsinchyk

Subscribers: smishra, yql

Differential Revision: https://phorge.dev.yugabyte.com/D41376
ashetkar pushed a commit that referenced this pull request Apr 11, 2025
Summary:
After f85bbca, TServerSharedData is now set up as the prepare user data of the shared
memory allocator. TServerSharedData is initially mapped to a temporary address before address
negotiation runs, then remapped to the negotiated address once postmaster starts and we negotiate
addresses. But other code may access TServerSharedData both before and after we remap, and thus
there is a data race on the pointer to TServerSharedData between code using TServerSharedData and
code remapping it.

Additionally, it is possible for a pointer to the temporary address to be retrieved, then shared
memory remapped, then pointer to temporary address used, resulting in a SIGSEGV.

This diff puts access to the TServerSharedData pointer behind URCU (ConcurrentValue) and
additionally delays unmapping the temporary address until we have reached a quiescent state after
updating the TServerSharedData pointer.

This fixes TSAN failures from the above data race, with the following trace:
```
[ts-2] WARNING: ThreadSanitizer: data race (pid=10713)
[ts-2]   Write of size 8 at 0x721c00001db8 by thread T48:
[ts-2]     #0 yb::tserver::SharedMemoryManager::InitializeParentAllocatorsAndObjects() ${YB_SRC_ROOT}/src/yb/tserver/tserver_shared_mem.cc:328:9 (libtserver_shared.so+0x11f66)
[ts-2]     #1 yb::tserver::SharedMemoryManager::ExecuteParentNegotiator(std::shared_ptr<yb::AddressSegmentNegotiator> const&) ${YB_SRC_ROOT}/src/yb/tserver/tserver_shared_mem.cc:318:3 (libtserver_shared.so+0x11f66)

[ts-2]   Previous read of size 8 at 0x721c00001db8 by thread T41 (mutexes: write M0):
[ts-2]     #0 yb::tserver::SharedMemoryManager::SharedData() const ${YB_SRC_ROOT}/src/yb/tserver/tserver_shared_mem.h:182:13 (libtserver.so+0x26623d)
[ts-2]     #1 yb::tserver::DbServerBase::shared_object() const ${YB_SRC_ROOT}/src/yb/tserver/db_server_base.cc:136:31 (libtserver.so+0x26623d)
[ts-2]     yugabyte#2 yb::tserver::TabletServer::SetYsqlDBCatalogVersions(yb::master::DBCatalogVersionDataPB const&) ${YB_SRC_ROOT}/src/yb/tserver/tablet_server.cc:1126:7 (libtserver.so+0x3fafc8)
[ts-2]     yugabyte#3 yb::tserver::HeartbeatPoller::TryHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:499:15 (libtserver.so+0x278e87)
[ts-2]     yugabyte#4 yb::tserver::HeartbeatPoller::Poll() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:273:19 (libtserver.so+0x276143)
[ts-2]     yugabyte#5 yb::tserver::MasterLeaderPollScheduler::Impl::Run() ${YB_SRC_ROOT}/src/yb/tserver/master_leader_poller.cc:133:25 (libtserver.so+0x27ec33)
```
Jira: DB-15671

Test Plan: Jenkins

Reviewers: sergei

Reviewed By: sergei

Subscribers: zdrudi, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D42395
ashetkar pushed a commit that referenced this pull request Apr 11, 2025
…select statement with key columns only

Summary:
There are several scenarios where rows are requested by key columns only (including constraints and
aggregates like count). Doc reader have an optimization for this kind of reads: since actual values
does not matter (because key column values can be extracted from the row's doc key) then there is
no need to decode row values and it is enough to check if the row is tombstoned or not.

Unfortunately, this optimization was not updated to take fast backward scan into account:
an iterator stops reading when the firstly met record is not marked with a tombstone value,
which is the case for fast backward scan, which reads a row in the reverse order, from the
oldest to the newest change (per sub doc key), and the oldest record is generally corresponds
to the initially inserted values.

The change updates the algorithm if checking the row existence: the scanning continues until
all row changes have been taken into account to get the actual state of the row.

**Examples of row existence check before the fix**

Example #1
```
Record #1: KEY, T1 => PACKED ROW
Record yugabyte#2: KEY, COL1, T2 => COL1 NEW VALUE 1

Forward scan / old backward scan:
#1 (alive, T1) => stop
result: row exists.

Fast backward scan:
yugabyte#2 (alive, T2) => stop
result: row exists.
```

Example yugabyte#2
```
Record #1: KEY, T3 => DEL
Record yugabyte#2: KEY, T1 => PACKED ROW
Record yugabyte#3: KEY, COL1, T2 => COL1 NEW VALUE

Forward scan / old backward scan:
#1 (deleted, T3), yugabyte#2 (skipped: T1 < T3), yugabyte#3 (skipped: T2 < T3) => stop
result: row does not exist.

Fast backward scan:
yugabyte#3 (alive, T2) => stop
result: row exists.
```

Example yugabyte#3
```
Record #1: KEY, T4 => PACKED ROW
Record yugabyte#2: KEY, T3 => DEL
Record yugabyte#3: KEY, T1 => PACKED ROW
Record yugabyte#4: KEY, COL1, T2 => COL1 NEW VALUE

Forward scan / old backward scan:
#1 (alive, T4) => stop
result: row exists.

Fast backward scan:
yugabyte#4 (alive, T2) => stop
result: row exists.
```

**Examples of row existence check with the fix**

Example #1
```
Record #1: KEY, T1 => PACKED ROW
Record yugabyte#2: KEY, COL1, T2 => COL1 NEW VALUE 1

Fast backward scan:
yugabyte#2 (alive, T2) => #1 (skipped, T1 < T2) => stop
result: row exists.
```

Example yugabyte#2
```
Record #1: KEY, T3 => DEL
Record yugabyte#2: KEY, T1 => PACKED ROW
Record yugabyte#3: KEY, COL1, T2 => COL1 NEW VALUE

Fast backward scan:
yugabyte#3 (alive, T2) => yugabyte#2 (skipped: T1 < T2) => #1 (deleted: T3 > T2) => stop
result: row does not exist.
```

Example yugabyte#3
```
Record #1: KEY, T4 => PACKED ROW
Record yugabyte#2: KEY, T3 => DEL
Record yugabyte#3: KEY, T1 => PACKED ROW
Record yugabyte#4: KEY, COL1, T2 => COL1 NEW VALUE

Fast backward scan:
yugabyte#4 (alive, T2) => yugabyte#3 (skipped: T1 < T2) => yugabyte#2 (deleted: T3 > T2) => #1 (alive: T4 > T3) => stop
result: row exists.
```
Jira: DB-15387

Test Plan:
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kV1
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kV2
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithNulls_kNone
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kV1
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kV2
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Fast_WithoutNulls_kNone
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kV1
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kV2
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithNulls_kNone
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kV1
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kV2
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest/PgFastBackwardScanTest.Simple/Slow_WithoutNulls_kNone

**manual testing:**
```
SET yb_use_hash_splitting_by_default = FALSE;
CREATE TABLE t1(c0 int UNIQUE DEFAULT 1);
INSERT INTO t1(c0) VALUES (2);
DELETE FROM t1;
SELECT * FROM t1;
SELECT * FROM t1 GROUP BY t1.c0 ORDER BY t1.c0 DESC;
```

Reviewers: sergei, rthallam

Reviewed By: sergei, rthallam

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D42212
ashetkar pushed a commit that referenced this pull request Apr 11, 2025
Summary:
After commit f85bbca, vmodule flag is no longer respected by postgres process, for example:
```
ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k
zgrep pg_sample ~/logs/latest_test/1.log
```
shows no vlogs.

The reason is that `VLOG(1)` is used early by
```
#0  0x00007f7e1b48b090 in google::InitVLOG3__(google::SiteFlag*, int*, char const*, int)@plt () from /net/dev-server-timur/share/code/yugabyte-db/build/debug-clang19-dynamic-ninja/lib/libyb_util_shmem.so
#1  0x00007f7e1b47616e in yb::(anonymous namespace)::NegotiatorSharedState::WaitProposal (this=0x7f7e215e8000) at ../../src/yb/util/shmem/reserved_address_segment.cc:108
yugabyte#2  0x00007f7e1b4781e0 in yb::AddressSegmentNegotiator::Impl::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:252
yugabyte#3  0x00007f7e1b4737ce in yb::AddressSegmentNegotiator::NegotiateChild (fd=45) at ../../src/yb/util/shmem/reserved_address_segment.cc:376
yugabyte#4  0x00007f7e1b742b7b in yb::tserver::SharedMemoryManager::InitializePostmaster (this=0x7f7e202e9788 <yb::pggate::PgSharedMemoryManager()::shared_mem_manager>, fd=45) at ../../src/yb/tserver/tserver_shared_mem.cc:252
yugabyte#5  0x00007f7e2023588f in yb::pggate::PgSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/pg_shared_mem.cc:29
yugabyte#6  0x00007f7e202788e9 in YBCSetupSharedMemoryAddressSegment () at ../../src/yb/yql/pggate/ybc_pg_shared_mem.cc:22
yugabyte#7  0x000055636b8956f5 in PostmasterMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1083
yugabyte#8  0x000055636b774bfe in PostgresServerProcessMain (argc=21, argv=0x52937fe4e790) at ../../../../../../src/postgres/src/backend/main/main.c:209
yugabyte#9  0x000055636b7751f2 in main ()
```
and caches `vmodule` value before `InitGFlags` sets it from environment.

The fix is to explicitly call `UpdateVmodule` from `InitGFlags` after setting `vmodule`.
Jira: DB-15888

Test Plan:
```
ybd release --cxx-test pgwrapper_pg_analyze-test --gtest_filter PgAnalyzeTest.AnalyzeSamplingColocated --test-args '--vmodule=pg_sample=1' -n 2 -- -p 1 -k
zgrep pg_sample ~/logs/latest_test/1.log
```

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: ybase, yql

Tags: #jenkins-ready, #jenkins-trigger

Differential Revision: https://phorge.dev.yugabyte.com/D42731
ashetkar pushed a commit that referenced this pull request Jun 4, 2025
…rdup for tablegroup_name

Summary:
As part of D36859 / 0dbe7d6, backup and restore support for colocated tables when multiple tablespaces exist was introduced. Upon
fetching the tablegroup_name from `pg_yb_tablegroup`, the value was read and assigned via `PQgetvalue` without copying. This led to a use-after-free bug when the
tablegroup_name was later read in dumpTableSchema since the result from the SQL query is immediately cleared in the next line (`PQclear`).

```
[P-yb-controller-1] ==3037==ERROR: AddressSanitizer: heap-use-after-free on address 0x51d0002013e6 at pc 0x55615b0a1f92 bp 0x7fff92475970 sp 0x7fff92475118
[P-yb-controller-1] READ of size 8 at 0x51d0002013e6 thread T0
[P-yb-controller-1]     #0 0x55615b0a1f91 in strcmp ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:470:5
[P-yb-controller-1]     #1 0x55615b1b90ba in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15789:8
[P-yb-controller-1]     yugabyte#2 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     yugabyte#3 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     yugabyte#4 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     yugabyte#5 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]     yugabyte#6 0x55615b0894bd in _start (${BUILD_ROOT}/postgres/bin/ysql_dump+0x10d4bd)
[P-yb-controller-1]
[P-yb-controller-1] 0x51d0002013e6 is located 358 bytes inside of 2048-byte region [0x51d000201280,0x51d000201a80)
[P-yb-controller-1] freed by thread T0 here:
[P-yb-controller-1]     #0 0x55615b127196 in free ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
[P-yb-controller-1]     #1 0x7f3c02d65e85 in PQclear ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:755:3
[P-yb-controller-1]     yugabyte#2 0x55615b1c0103 in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19108:4
[P-yb-controller-1]     yugabyte#3 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     yugabyte#4 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     yugabyte#5 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     yugabyte#6 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     yugabyte#7 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
[P-yb-controller-1]
[P-yb-controller-1] previously allocated by thread T0 here:
[P-yb-controller-1]     #0 0x55615b12742f in malloc ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
[P-yb-controller-1]     #1 0x7f3c02d680a7 in pqResultAlloc ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:633:28
[P-yb-controller-1]     yugabyte#2 0x7f3c02d81294 in getRowDescriptions ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:544:4
[P-yb-controller-1]     yugabyte#3 0x7f3c02d7f793 in pqParseInput3 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11
[P-yb-controller-1]     yugabyte#4 0x7f3c02d6bcc8 in parseInput ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2014:2
[P-yb-controller-1]     yugabyte#5 0x7f3c02d6bcc8 in PQgetResult ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2100:3
[P-yb-controller-1]     yugabyte#6 0x7f3c02d6cd87 in PQexecFinish ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2417:19
[P-yb-controller-1]     yugabyte#7 0x7f3c02d6cd87 in PQexec ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/fe-exec.c:2256:9
[P-yb-controller-1]     yugabyte#8 0x55615b1f45df in ExecuteSqlQuery ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:296:8
[P-yb-controller-1]     yugabyte#9 0x55615b1f4213 in ExecuteSqlQueryForSingleRow ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_backup_db.c:311:8
[P-yb-controller-1]     yugabyte#10 0x55615b1c008d in getYbTablePropertiesAndReloptions ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:19102:10
[P-yb-controller-1]     yugabyte#11 0x55615b1b8fab in dumpTableSchema ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15765:3
[P-yb-controller-1]     yugabyte#12 0x55615b178163 in dumpTable ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:15299:4
[P-yb-controller-1]     yugabyte#13 0x55615b178163 in dumpDumpableObject ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:10216:4
[P-yb-controller-1]     yugabyte#14 0x55615b178163 in main ${YB_SRC_ROOT}/src/postgres/src/bin/pg_dump/pg_dump.c:1019:3
[P-yb-controller-1]     yugabyte#15 0x7f3c0184e7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: fd70eb98f80391a177070fcb8d757a63fe49b802)
```

This revision fixes the issue by using pg_strdup to make a copy of the string.
Jira: DB-15915

Test Plan: ./yb_build.sh asan --cxx-test integration-tests_xcluster_ddl_replication-test --gtest_filter XClusterDDLReplicationTest.DDLReplicationTablesNotColocated

Reviewers: aagrawal, skumar, mlillibridge, sergei

Reviewed By: aagrawal, sergei

Subscribers: sergei, yql

Differential Revision: https://phorge.dev.yugabyte.com/D43386
ashetkar pushed a commit that referenced this pull request Jun 4, 2025
…terTSConfig

Summary:
Fixing this ASAN error:
```
../../../../../../src/postgres/src/backend/commands/event_trigger.c:1817:36: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
```
Call stack:
```
#0 0x55597fecb32d in EventTriggerCollectAlterTSConfig : src/postgres/src/backend/commands/event_trigger.c:1817:2
#1 0x55598004305e in DropConfigurationMapping : src/postgres/src/backend/commands/tsearchcmds.c:1518:2
```
The function `EventTriggerCollectAlterTSConfig()` is called with `NULL` argument in the PG code:
`static void DropConfigurationMapping(......) {`
`EventTriggerCollectAlterTSConfig(stmt, cfgId, NULL, 0);`
This `NULL` can go into `memcpy()`.
`memcpy(command->d.atscfg.dictIds, dictIds, sizeof(Oid) * ndicts);    <<<  dictIds==NULL here`

 The case is not expected. The code is fixed.

Test Plan: ./yb_build.sh asan --java-test org.yb.pgsql.TestPgRegressTypesString#schedule

Reviewers: mihnea, kfranz

Reviewed By: kfranz

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D43395
ashetkar pushed a commit that referenced this pull request Jun 4, 2025
…lations

Summary:
This revision fixes two bugs in foreign keys that reference partitioned tables when the root and the leaf PK relations have different column orders. The details of the bugs are as follows:

  # When the foreign key is created using ALTER TABLE, the cleanup post the constraint validation is not done properly, leading to `TupleDesc reference leak` warnings. Specifically, the tuples in `EState.es_tupleTable` are not reset. It contains the ResultRelInfo.ri_PartitionTupleSlot when the root and leaf relations have different column orders.

  # When the root and leaf relations have different column orders, ExecGetChildToRootMap() allocates the tuple conversion map, if not already done. `YbFindReferencedPartition()` incorrectly calls this function in the current memory context; rather, it should be called in the query context. This leads to use-after-free when the foreign key constraint is in DEFERRED mode. Below is a detailed explanation:

When inserting into a relation with fk constraint, YbFindReferencedPartition() is invoked in two code paths with different memory context in each case:

invocation #1: AfterTriggerSaveEvent -> YbAddTriggerFKReferenceIntent -> YBCBuildYBTupleIdDescriptor -> YbFindReferencedPartition: In this case, it is invoked in per-query memory context.
invocation yugabyte#2: afterTriggerInvokeEvents -> AfterTriggerExecute -> ExecCallTriggerFunc -> RI_FKey_check_ins -> RI_FKey_check -> YBCBuildYBTupleIdDescriptor -> YbFindReferencedPartition: In this case, it is invoked in per-tuple memory context.

In the IMMEDIATE mode, the same executor state is used in both invocations. The invocation #1 allocates the map in per-query memory context, and invocation yugabyte#2 just reads it, without needing to reallocate it.
In the DEFERRED case, the executor states are different. afterTriggerInvokeEvents() creates a new local executor state to execute the triggers. Hence, the invocation yugabyte#2 needs to allocate the map, but it ends up doing it in the (incorrect) per-tuple memory context. Consider the case when two (or more) tuples are batch-inserted in the FK relation. The first trigger execution invokes ExecGetChildToRootMap(), which sets ri_ChildToRootMapValid as valid and allocates the map in the per-tuple context. This context is reset before the next trigger execution. The second trigger execution seg faults because ri_ChildToRootMapValid indicates the map is valid, but that is not the case.

In passing, also initialise estate->yb_es_pk_proutes to NIL to keep the behaviour the same as other fields of list type in EState.
Jira: DB-16152

Test Plan:
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressForeignKey'

Close: yugabyte#26772

Reviewers: dmitry, kramanathan, #db-approvers

Reviewed By: kramanathan, #db-approvers

Subscribers: svc_phabricator, yql

Differential Revision: https://phorge.dev.yugabyte.com/D43893
ashetkar pushed a commit that referenced this pull request Jun 4, 2025
Summary:
Added `check_data_blocks` to sst_dump tool. It will check all data blocks consistency using SST file index and per-block checksums.
`--skip_uncompress` option specifies that `check_data_blocks` command should skip uncompressing data blocks (false by default).

If any corrupt data blocks are found, the tool will generate `dd` commands which could be used to save specific blocks (corrupted block and one block before/after) for analysis.

Example usage:
```
./sst_dump --command=check_data_blocks --file=000046.sst 2>&1 | tee sst_dump.out
cat sst_dump.out | grep '^dd '
```
Jira: DB-16720

Test Plan:
Tested manually on artificially corrupt SST file:
```
~/code/yugabyte5/build/latest/bin/sst_dump --command=check_data_blocks --file=000046.sst 2>&1 | tee sst_dump.out
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0517 07:21:12.046017  3481 sst_dump_tool.cc:433] Checking data block #0 handle: BlockHandle { offset: 0 size: 3927 }
W0517 07:21:12.051733  3481 file_reader_writer.cc:101] Read attempt #1 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
W0517 07:21:12.051852  3481 file_reader_writer.cc:101] Read attempt yugabyte#2 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
E0517 07:21:12.051862  3481 format.cc:466] ReadBlockContents: Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
    @     0x7fd68e2eafde  rocksdb::SstFileReader::CheckDataBlocks()
    @     0x7fd68e2ec8ca  rocksdb::SSTDumpTool::Run()
    @     0x555855a53f78  main
    @     0x7fd688e45ca3  __libc_start_main
    @     0x555855a53eae  _start
W0517 07:21:12.056023  3481 sst_dump_tool.cc:441] Failed to read block with handle: BlockHandle { offset: 996452 size: 4024 }. Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 996452 size: 4024 }, expected checksum: 1118200703, actual checksum: 1023486239.
from [] to []
Process 000046.sst
Sst file format: block-based
dd if="000046.sst.sblock.0" bs=1 skip=992433 count=4014 of="000046.sst.sblock.0.offset_992433.size_4014.part"
dd if="000046.sst.sblock.0" bs=1 skip=996452 count=4024 of="000046.sst.sblock.0.offset_996452.size_4024.part"
dd if="000046.sst.sblock.0" bs=1 skip=1000481 count=4019 of="000046.sst.sblock.0.offset_1000481.size_4019.part"
W0517 07:21:12.634256  3481 file_reader_writer.cc:101] Read attempt #1 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
W0517 07:21:12.634332  3481 file_reader_writer.cc:101] Read attempt yugabyte#2 failed in file 000046.sst.sblock.0 : Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
E0517 07:21:12.634339  3481 format.cc:466] ReadBlockContents: Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
    @     0x7fd68e2eafde  rocksdb::SstFileReader::CheckDataBlocks()
    @     0x7fd68e2ec8ca  rocksdb::SSTDumpTool::Run()
    @     0x555855a53f78  main
    @     0x7fd688e45ca3  __libc_start_main
    @     0x555855a53eae  _start
W0517 07:21:12.638135  3481 sst_dump_tool.cc:441] Failed to read block with handle: BlockHandle { offset: 119754675 size: 11766 }. Corruption (yb/rocksdb/table/format.cc:345): Block checksum mismatch in file: 000046.sst.sblock.0, block handle: BlockHandle { offset: 119754675 size: 11766 }, expected checksum: 84894407, actual checksum: 2295535311.
dd if="000046.sst.sblock.0" bs=1 skip=119742901 count=11769 of="000046.sst.sblock.0.offset_119742901.size_11769.part"
dd if="000046.sst.sblock.0" bs=1 skip=119754675 count=11766 of="000046.sst.sblock.0.offset_119754675.size_11766.part"
dd if="000046.sst.sblock.0" bs=1 skip=119766446 count=1384 of="000046.sst.sblock.0.offset_119766446.size_1384.part"
```

```
cat sst_dump.out | grep '^dd '
dd if="000046.sst.sblock.0" bs=1 skip=992433 count=4014 of="000046.sst.sblock.0.offset_992433.size_4014.part"
dd if="000046.sst.sblock.0" bs=1 skip=996452 count=4024 of="000046.sst.sblock.0.offset_996452.size_4024.part"
dd if="000046.sst.sblock.0" bs=1 skip=1000481 count=4019 of="000046.sst.sblock.0.offset_1000481.size_4019.part"
dd if="000046.sst.sblock.0" bs=1 skip=119742901 count=11769 of="000046.sst.sblock.0.offset_119742901.size_11769.part"
dd if="000046.sst.sblock.0" bs=1 skip=119754675 count=11766 of="000046.sst.sblock.0.offset_119754675.size_11766.part"
dd if="000046.sst.sblock.0" bs=1 skip=119766446 count=1384 of="000046.sst.sblock.0.offset_119766446.size_1384.part"
```

Reviewers: hsunder, yyan

Reviewed By: yyan

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D44035
ashetkar pushed a commit that referenced this pull request Jun 4, 2025
Summary:
Currently, `CREATE RULE` does not bump the catalog version under the assumption that creating a rule can only insert entries into system catalog tables but not update or delete them.

However, this assumption is false; we need to bump the catalog version in order to maintain behavior parity with vanilla PG.

### What are rules?

Rules allow users to specify additional commands that are to be performed automatically whenever an insert/update/delete is executed on a given table. For example:

```
CREATE RULE redirect_to_destination AS ON INSERT TO intermediate_table
    WHERE NEW.id > 25 DO INSTEAD
INSERT INTO destination_table VALUES (NEW.id, NEW.name);
```

This rule redirects inserts from `intermediate_table` to `destination_table` instead when the inserted values fall into the given range.

### Rules internals

PG stores rules in the `pg_rewrite` table. Additionally, there is a column in `pg_class` (`relhasrules`) that specifies whether the relation has rules on it.

The rules and the relhasrules flag are stored in the relcache. When we build the relcache entry for a table, first we check if `relhasrules` is true. If so, we load the rules from `pg_rewrite`; otherwise, we just set the rules to null:

```
lang=c,name=relcache.c
if (relation->rd_rel->relhasrules)
	RelationBuildRuleLock(relation);
else
{
	relation->rd_rules = NULL;
	relation->rd_rulescxt = NULL;
}
```

We read the rules during query planning in `RewriteQuery()`:

```
lang=c,name=rewriteHandler.c
locks = matchLocks(event, rt_entry_relation->rd_rules,
					result_relation, parsetree, &hasUpdate);

product_orig_rt_length = list_length(parsetree->rtable);
product_queries = fireRules(parsetree,
                            result_relation,
                            event,
                            locks,
                            &instead,
                            &returning,
                            &qual_product);
```

### How the issue occurs

The issue comes up during the following scenario:
1. On backend #1, we build the relcache entry for table `t` before there are any rules on `t`. This could be during relcache preloading, when we build the relcache entries for all user tables, or an ad-hoc load of `t`'s relcache entry.
2. Create the rule on `intermediate_table` on backend yugabyte#2. This create a row in `pg_rewrite` and flips `relhasrules` to `true`, but the relcache entries in backend #1 stay the same because we don't bump the catalog version.
3. On backend #1, do an insert on `intermediate_table` that should trigger the rule. Because the relcache entry still has `rd_rules=NULL`, none of the rules will fire.
Jira: DB-16686

Test Plan:
```
./yb_build.sh release --sj --cxx-test pg_catalog_version-test --gtest-filter "*CreateRule*"
```

Reviewers: myang

Reviewed By: myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D44279
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant