Enhance presto_unnest() Functionality

## Current State

The `presto_unnest()` function in `R/presto_unnest.R` currently has several limitations:

1. **Single column limitation**: Line 92-94 enforces that only one column can be unnested at a time:
   ```
   } else if (length(valid_col_names) > 1) {
     stop("Currently only one column can be unnested at a time", call. = FALSE)
   }
   ```

2. **Manual alias specification required**: For `ARRAY(ROW(...))` types, users must manually specify field names in the SQL alias, even though this information is available via `presto_type()`.

3. **No tidyr-like options**: Missing `names_sep` and `names_repair` parameters that would make column naming more flexible and consistent with `tidyr::unnest()`.

4. **No partial projection**: Cannot extract only specific fields from `ARRAY(ROW(...))` types.

## Detailed Improvements

### 1. Support Multiple Column Unnesting

**Current Code Location**: `R/presto_unnest.R` lines 67-119

**Implementation Steps**:
1. Remove the single-column check at lines 92-94
2. Modify `presto_unnest.tbl_presto()` to handle multiple columns:
   - Accept multiple columns via `tidyselect::eval_select()` (already supports this at line 74)
   - Store multiple column names in the `lazy_unnest_query` structure
   - Update `op_vars.lazy_unnest_query()` (lines 130-139) to handle multiple unnested columns
3. Update `sql_render.unnest_query()` (lines 214-258) to generate multiple `CROSS JOIN UNNEST` clauses

**SQL Generation**:
- For single column: `CROSS JOIN UNNEST(col) AS t(elem)`
- For multiple columns: `CROSS JOIN UNNEST(col1) AS t1(elem1), UNNEST(col2) AS t2(elem2)`
- Handle cases where arrays have different lengths (Presto behavior: Cartesian product)

**Example Usage**:
```r
tbl(con, "test") %>%
  presto_unnest(c(arr1, arr2), values_to = c("val1", "val2"))
```

**Edge Cases**:
- Arrays with different lengths (should produce Cartesian product)
- NULL arrays (should produce no rows for that array)
- Empty arrays (should produce no rows for that array)
- Mix of NULL and non-NULL arrays

### 2. Automatic ROW Field Detection

**Current Code Location**: `R/presto_unnest.R` lines 214-258 (SQL rendering)

**Implementation Steps**:
1. In `presto_unnest.tbl_presto()`, when a column is detected as `ARRAY(ROW(...))`:
   - Call `presto_type()` on the column to get type information
   - Parse the ROW type signature to extract field names (see `R/dbColumnType.R` lines 85-132 for type parsing logic)
   - Use `init.presto.field.from.json()` from `R/presto.field.R` lines 76-189 to extract ROW field structure
2. Automatically generate alias names from ROW field names:
   - If `ARRAY(ROW(id INTEGER, name VARCHAR))`, automatically use `AS t(id, name)`
   - Allow override via `values_to` parameter if provided
3. Update `sql_render.unnest_query()` to use detected field names

**Code References**:
- Type detection: `R/presto_type.R` lines 41-64
- ROW field extraction: `R/presto.field.R` lines 130-170
- Type string formatting: `R/dbColumnType.R` lines 85-132

**Example Usage**:
```r
# Current (manual):
tbl(con, "test") %>%
  presto_unnest(users, values_to = c("user_id", "user_name"))

# After (automatic):
tbl(con, "test") %>%
  presto_unnest(users)  # Automatically extracts id and name from ARRAY(ROW(id BIGINT, name VARCHAR))
```

**SQL Example**:
```sql
-- Before (manual):
SELECT * FROM test
CROSS JOIN UNNEST(users) AS t(user_id, user_name)

-- After (automatic detection):
SELECT * FROM test
CROSS JOIN UNNEST(users) AS t(id, name)  -- Field names from ROW type
```

### 3. Add tidyr-like Options

**Implementation Steps**:
1. Add `names_sep` parameter (default `NULL`):
   - When unnesting multiple columns, use separator to create column names
   - Example: `names_sep = "_"` with columns `arr1`, `arr2` produces `arr1_elem`, `arr2_elem`
2. Add `names_repair` parameter (default `"check_unique"`):
   - Options: `"check_unique"`, `"unique"`, `"universal"`, `"minimal"`
   - Use `vctrs::vec_as_names()` for name repair (similar to tidyr)
3. Apply name repair after generating column names from `values_to` or automatic detection

**Example Usage**:
```r
tbl(con, "test") %>%
  presto_unnest(
    c(arr1, arr2),
    values_to = c("val1", "val2"),
    names_sep = "_",
    names_repair = "unique"
  )
```

**Edge Cases**:
- Duplicate column names after unnesting
- Column names that conflict with existing table columns
- Special characters in ROW field names

### 4. Support Partial Projections

**Implementation Steps**:
1. Add `fields` parameter (default `NULL` for all fields):
   - Accept character vector of field names to extract
   - Example: `fields = c("id", "name")` extracts only those fields from `ARRAY(ROW(id, name, email, ...))`
2. In SQL generation, only include specified fields in the UNNEST alias
3. Validate that specified fields exist in the ROW type

**Example Usage**:
```r
# Extract only id and name from ARRAY(ROW(id, name, email, created_at))
tbl(con, "test") %>%
  presto_unnest(users, fields = c("id", "name"))
```

**SQL Example**:
```sql
-- Full projection:
SELECT * FROM test
CROSS JOIN UNNEST(users) AS t(id, name, email, created_at)

-- Partial projection (fields = c("id", "name")):
SELECT * FROM test
CROSS JOIN UNNEST(users) AS t(id, name)
```

## Files to Modify

### Primary Files
- **`R/presto_unnest.R`** (lines 12-258):
  - Update `presto_unnest.tbl_presto()` (lines 67-119) to handle multiple columns and ROW field detection
  - Update `op_vars.lazy_unnest_query()` (lines 130-139) for multiple columns
  - Update `sql_render.unnest_query()` (lines 214-258) for SQL generation
  - Add helper function to detect and parse ROW types using `presto_type()`

### Supporting Files
- **`R/presto_type.R`**: May need helper to extract ROW field names from type signature
- **`R/dbColumnType.R`**: Reference `format_presto_type_string()` (lines 85-132) for ROW parsing logic

### Test Files
- **`tests/testthat/test-presto_unnest.R`**: Add comprehensive tests for:
  - Multiple column unnesting (various combinations)
  - Automatic ROW field detection
  - `names_sep` and `names_repair` behavior
  - Partial projections
  - Edge cases (NULL arrays, empty arrays, different lengths)
  - Integration with existing tests (group_by, arrange, window functions, CTEs)

## Implementation Checklist

- [ ] Remove single-column limitation check
- [ ] Implement multiple column support in `presto_unnest.tbl_presto()`
- [ ] Update `op_vars.lazy_unnest_query()` for multiple columns
- [ ] Update `sql_render.unnest_query()` for multiple UNNEST clauses
- [ ] Add ROW type detection using `presto_type()`
- [ ] Implement automatic field name extraction from ROW types
- [ ] Add `names_sep` parameter and logic
- [ ] Add `names_repair` parameter and logic (using `vctrs::vec_as_names()`)
- [ ] Add `fields` parameter for partial projections
- [ ] Add validation for `fields` parameter (check field existence)
- [ ] Update function documentation with new parameters
- [ ] Add examples to function documentation
- [ ] Write comprehensive tests for all new features
- [ ] Test edge cases (NULL, empty arrays, different lengths)
- [ ] Ensure backward compatibility with existing code

## Testing Requirements

### Unit Tests
1. Multiple column unnesting with same-length arrays
2. Multiple column unnesting with different-length arrays (Cartesian product)
3. Automatic ROW field detection for `ARRAY(ROW(...))`
4. `names_sep` parameter behavior
5. `names_repair` parameter with various options
6. Partial projections with `fields` parameter
7. Error handling: invalid field names, non-ROW types with `fields`

### Integration Tests
1. Multiple unnest + group_by + summarize
2. Multiple unnest + arrange
3. Multiple unnest + window functions
4. Multiple unnest + CTEs
5. Backward compatibility: existing single-column usage still works

### Edge Case Tests
1. NULL arrays
2. Empty arrays
3. Mix of NULL and non-NULL arrays
4. Arrays with zero elements
5. Very large arrays
6. Nested complex types (ARRAY(ROW(...)) with nested ROWs)

## Acceptance Criteria

1. ✅ Can unnest multiple columns in a single call
2. ✅ Automatically detects and uses ROW field names for `ARRAY(ROW(...))` types
3. ✅ `names_sep` parameter works as expected
4. ✅ `names_repair` parameter works with all supported options
5. ✅ Partial projections work with `fields` parameter
6. ✅ All existing tests pass (backward compatibility)
7. ✅ New comprehensive test suite passes
8. ✅ Documentation updated with examples
9. ✅ Function signature matches tidyr::unnest() where applicable

## Related
Part of RPresto Improvement Plan - Feature 1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance presto_unnest() Functionality #325

Current State

Detailed Improvements

1. Support Multiple Column Unnesting

2. Automatic ROW Field Detection

3. Add tidyr-like Options

4. Support Partial Projections

Files to Modify

Primary Files

Supporting Files

Test Files

Implementation Checklist

Testing Requirements

Unit Tests

Integration Tests

Edge Case Tests

Acceptance Criteria

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhance presto_unnest() Functionality #325

Description

Current State

Detailed Improvements

1. Support Multiple Column Unnesting

2. Automatic ROW Field Detection

3. Add tidyr-like Options

4. Support Partial Projections

Files to Modify

Primary Files

Supporting Files

Test Files

Implementation Checklist

Testing Requirements

Unit Tests

Integration Tests

Edge Case Tests

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions