Skip to content

Commit 3d779bd

Browse files
jmchiltonclaude
andcommitted
gxwf Tool Shed search parity (tool-search, repo-search, tool-versions, tool-revisions)
Mirror @galaxy-tool-util/search and the four CLI subcommands. Pydantic v2 wire models with stringified-pagination coercion, requests-based client (ToolFetchError, 404→empty-page on /api/tools), build_repo_query w/ owner:/ category: keywords, get_tool_revisions ordered oldest→newest, TRS versions helpers, and ToolSearchService (multi-source dedup + score sort + optional ParsedTool enrich). NormalizedToolHit/NormalizedRepoHit emit camelCase JSON via alias_generator=to_camel for byte-equivalent --json envelopes. Fixtures synced from the TS package; 34 unit + CLI tests cover model normalization, pagination stop conditions, version filtering, exit codes, and JSON shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1eb5f07 commit 3d779bd

21 files changed

Lines changed: 1874 additions & 1 deletion

doc/source/dev/wf_tooling.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ Galaxy's workflow CLI tooling spans three packages at different abstraction leve
2626
| `gxwf roundtrip-tree` | galaxy-tool-util | Round-trip validate all workflows in a directory |
2727
| `gxwf lint-tree` | galaxy-tool-util | Lint all workflows in a directory |
2828
| `gxwf convert-tree` | galaxy-tool-util | Batch convert workflows in a directory |
29+
| `gxwf tool-search` | galaxy-tool-util | Search the Tool Shed for tools matching a query |
30+
| `gxwf repo-search` | galaxy-tool-util | Search the Tool Shed for repositories (popularity-boosted) |
31+
| `gxwf tool-versions` | galaxy-tool-util | List TRS-published versions of a Tool Shed tool |
32+
| `gxwf tool-revisions` | galaxy-tool-util | List changeset revisions that publish a Tool Shed tool |
2933
| `gxwf viz` | galaxy-tool-util | Interactive Cytoscape graph (requires gxformat2) |
3034
| `gxwf abstract-export` | galaxy-tool-util | Abstract CWL export (requires gxformat2) |
3135
| `gxwf mermaid` | galaxy-tool-util | Mermaid diagram (requires gxformat2) |
@@ -256,6 +260,31 @@ gxwf-to-format2 --compact my-workflow.ga # strip positions
256260

257261
Both commands share `--compact`, `--json`, and `-o` flags. The key difference: `gxwf convert` (schema-aware) produces proper `state` dicts by consulting tool definitions to decode double-encoded JSON strings. `gxwf-to-format2` copies the raw `tool_state` strings as-is since it has no tool schema.
258262

263+
### Tool Shed Search
264+
265+
Discover tools and repositories in a Tool Shed without leaving the CLI. All four subcommands accept `--toolshed-url` (defaults to the main Galaxy Tool Shed) and a `--json` flag for machine-readable output. Exit codes: `0` on hits, `2` when nothing matched, `3` on a Tool Shed transport failure.
266+
267+
```bash
268+
gxwf tool-search fastqc # tools matching the query
269+
gxwf tool-search fastqc --owner devteam # client-side owner filter
270+
gxwf tool-search fastqc --match-name # drop hits whose name lacks any query token
271+
gxwf tool-search fastqc --enrich --cache-dir ~/.galaxy/tool_info_cache # attach ParsedTool
272+
273+
gxwf repo-search fastqc --owner devteam --category "sequence analysis"
274+
# server-side owner: / category: keywords
275+
276+
gxwf tool-versions devteam/fastqc/fastqc # TRS-published versions, oldest→newest
277+
gxwf tool-versions devteam~fastqc~fastqc --latest # only the newest
278+
279+
gxwf tool-revisions devteam/fastqc/fastqc # changeset revisions
280+
gxwf tool-revisions devteam/fastqc/fastqc --tool-version 0.74+galaxy0
281+
gxwf tool-revisions devteam/fastqc/fastqc --latest --json # newest, machine-readable
282+
```
283+
284+
`tool-search` flattens hits into a snake-case `NormalizedToolHit` (with derived `trs_tool_id` and `full_tool_id`) and sorts by Whoosh BM25 score; the underlying TS package emits the same shape. `tool-revisions` orders matches via Tool Shed's `get_ordered_installable_revisions`. Tool versions are not monotonic — the same `version` string can legally appear in multiple changesets, so prefer `tool-revisions` when pinning workflows.
285+
286+
The `ToolSearchService` class (`galaxy.tool_util.workflow_state.tool_search`) lets library callers fan a query across multiple Tool Sheds, dedupe `(owner, repo, tool_id)` first-source-wins, and optionally enrich each hit through `ToolShedGetToolInfo`.
287+
259288
### Visualization and Abstract Export
260289

261290
These subcommands are pass-throughs to the corresponding gxformat2 binaries (require gxformat2 installed):
Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
"""HTTP client for Tool Shed search, repo search, revisions, and TRS versions.
2+
3+
Mirrors ``packages/search/src/client/`` on the TypeScript side. All helpers
4+
raise :class:`ToolFetchError` on transport failures, non-2xx responses, or
5+
malformed payloads. ``search_tools`` treats a 404 as an empty page (some
6+
Tool Shed versions return that when paging past the last page of hits).
7+
"""
8+
9+
import sys
10+
from typing import (
11+
Iterator,
12+
List,
13+
Optional,
14+
)
15+
from urllib.parse import (
16+
quote,
17+
urlencode,
18+
)
19+
20+
import requests
21+
from pydantic import ValidationError
22+
23+
from ._toolshed_search_models import (
24+
RepositorySearchHit,
25+
SearchResults,
26+
ToolRevisionMatch,
27+
ToolSearchHit,
28+
TRSToolVersion,
29+
)
30+
31+
REQUEST_TIMEOUT_S = 30
32+
33+
34+
class ToolFetchError(Exception):
35+
"""Raised on Tool Shed transport failures, non-2xx responses, or malformed payloads."""
36+
37+
def __init__(self, message: str, url: str, status: Optional[int] = None):
38+
super().__init__(message)
39+
self.url = url
40+
self.status = status
41+
42+
43+
def _get_json(url: str, session: Optional[requests.Session], *, allow_404: bool = False) -> Optional[object]:
44+
sess = session or requests
45+
try:
46+
response = sess.get(url, headers={"Accept": "application/json"}, timeout=REQUEST_TIMEOUT_S)
47+
except requests.RequestException as err:
48+
raise ToolFetchError(f"Tool Shed request to {url} failed: {err}", url) from err
49+
if allow_404 and response.status_code == 404:
50+
return None
51+
if not response.ok:
52+
body = (response.text or "")[:200]
53+
raise ToolFetchError(
54+
f"Tool Shed request to {url} failed: {response.status_code} {body}",
55+
url,
56+
response.status_code,
57+
)
58+
try:
59+
return response.json()
60+
except ValueError as err:
61+
raise ToolFetchError(
62+
f"Tool Shed response from {url} was not valid JSON: {err}", url, response.status_code
63+
) from err
64+
65+
66+
def search_tools(
67+
toolshed_url: str,
68+
query: str,
69+
*,
70+
page: Optional[int] = None,
71+
page_size: Optional[int] = None,
72+
session: Optional[requests.Session] = None,
73+
) -> SearchResults[ToolSearchHit]:
74+
"""Fetch one page of tool search results.
75+
76+
Pass ``query`` verbatim — the Tool Shed wraps with ``*term*`` server-side.
77+
"""
78+
params = {"q": query}
79+
if page is not None:
80+
params["page"] = str(page)
81+
if page_size is not None:
82+
params["page_size"] = str(page_size)
83+
url = f"{toolshed_url}/api/tools?{urlencode(params)}"
84+
85+
sess = session or requests
86+
try:
87+
response = sess.get(url, headers={"Accept": "application/json"}, timeout=REQUEST_TIMEOUT_S)
88+
except requests.RequestException as err:
89+
raise ToolFetchError(f"Tool Shed search request to {url} failed: {err}", url) from err
90+
91+
if response.status_code == 404:
92+
return SearchResults[ToolSearchHit](
93+
total_results=0,
94+
page=page or 1,
95+
page_size=page_size or 0,
96+
hostname=toolshed_url,
97+
hits=[],
98+
)
99+
100+
if not response.ok:
101+
body = (response.text or "")[:200]
102+
raise ToolFetchError(
103+
f"Tool Shed search request to {url} failed: {response.status_code} {body}",
104+
url,
105+
response.status_code,
106+
)
107+
108+
try:
109+
payload = response.json()
110+
except ValueError as err:
111+
raise ToolFetchError(
112+
f"Tool Shed search response from {url} was not valid JSON: {err}",
113+
url,
114+
response.status_code,
115+
) from err
116+
try:
117+
return SearchResults[ToolSearchHit].model_validate(payload)
118+
except ValidationError as err:
119+
raise ToolFetchError(
120+
f"Tool Shed search response from {url} was malformed: {err}",
121+
url,
122+
response.status_code,
123+
) from err
124+
125+
126+
def iterate_tool_search_pages(
127+
toolshed_url: str,
128+
query: str,
129+
*,
130+
page: Optional[int] = None,
131+
page_size: Optional[int] = None,
132+
session: Optional[requests.Session] = None,
133+
) -> Iterator[SearchResults[ToolSearchHit]]:
134+
"""Yield tool search pages until the server returns fewer hits than ``page_size``."""
135+
effective_size = page_size if page_size is not None else 10
136+
current = page if page is not None else 1
137+
while True:
138+
results = search_tools(toolshed_url, query, page=current, page_size=effective_size, session=session)
139+
yield results
140+
if len(results.hits) < effective_size:
141+
return
142+
current += 1
143+
144+
145+
def build_repo_query(query: str, *, owner: Optional[str] = None, category: Optional[str] = None) -> str:
146+
parts: List[str] = []
147+
trimmed = query.strip()
148+
if trimmed:
149+
parts.append(trimmed)
150+
if owner:
151+
parts.append(f"owner:{owner}")
152+
if category:
153+
parts.append(f'category:"{category}"' if any(c.isspace() for c in category) else f"category:{category}")
154+
return " ".join(parts)
155+
156+
157+
def search_repositories(
158+
toolshed_url: str,
159+
query: str,
160+
*,
161+
page: Optional[int] = None,
162+
page_size: Optional[int] = None,
163+
owner: Optional[str] = None,
164+
category: Optional[str] = None,
165+
session: Optional[requests.Session] = None,
166+
) -> SearchResults[RepositorySearchHit]:
167+
q = build_repo_query(query, owner=owner, category=category)
168+
params = {"q": q}
169+
if page is not None:
170+
params["page"] = str(page)
171+
if page_size is not None:
172+
params["page_size"] = str(page_size)
173+
url = f"{toolshed_url}/api/repositories?{urlencode(params)}"
174+
175+
sess = session or requests
176+
try:
177+
response = sess.get(url, headers={"Accept": "application/json"}, timeout=REQUEST_TIMEOUT_S)
178+
except requests.RequestException as err:
179+
raise ToolFetchError(f"Tool Shed repo search request to {url} failed: {err}", url) from err
180+
181+
if response.status_code == 404:
182+
return SearchResults[RepositorySearchHit](
183+
total_results=0,
184+
page=page or 1,
185+
page_size=page_size or 0,
186+
hostname=toolshed_url,
187+
hits=[],
188+
)
189+
190+
if not response.ok:
191+
body = (response.text or "")[:200]
192+
raise ToolFetchError(
193+
f"Tool Shed repo search request to {url} failed: {response.status_code} {body}",
194+
url,
195+
response.status_code,
196+
)
197+
198+
try:
199+
payload = response.json()
200+
except ValueError as err:
201+
raise ToolFetchError(
202+
f"Tool Shed repo search response from {url} was not valid JSON: {err}",
203+
url,
204+
response.status_code,
205+
) from err
206+
try:
207+
return SearchResults[RepositorySearchHit].model_validate(payload)
208+
except ValidationError as err:
209+
raise ToolFetchError(
210+
f"Tool Shed repo search response from {url} was malformed: {err}",
211+
url,
212+
response.status_code,
213+
) from err
214+
215+
216+
def iterate_repo_search_pages(
217+
toolshed_url: str,
218+
query: str,
219+
*,
220+
page: Optional[int] = None,
221+
page_size: Optional[int] = None,
222+
owner: Optional[str] = None,
223+
category: Optional[str] = None,
224+
session: Optional[requests.Session] = None,
225+
) -> Iterator[SearchResults[RepositorySearchHit]]:
226+
effective_size = page_size if page_size is not None else 10
227+
current = page if page is not None else 1
228+
while True:
229+
results = search_repositories(
230+
toolshed_url,
231+
query,
232+
page=current,
233+
page_size=effective_size,
234+
owner=owner,
235+
category=category,
236+
session=session,
237+
)
238+
yield results
239+
if len(results.hits) < effective_size:
240+
return
241+
current += 1
242+
243+
244+
def get_tool_revisions(
245+
toolshed_url: str,
246+
*,
247+
owner: str,
248+
repo: str,
249+
tool_id: str,
250+
version: Optional[str] = None,
251+
session: Optional[requests.Session] = None,
252+
) -> List[ToolRevisionMatch]:
253+
"""Resolve ``(owner, repo, tool_id[, version])`` to changeset revisions, oldest→newest.
254+
255+
Returns ``[]`` when the repo is absent, no revisions contain the tool, or
256+
``version`` is supplied but no revision publishes it.
257+
"""
258+
repo_list_url = f"{toolshed_url}/api/repositories?{urlencode({'owner': owner, 'name': repo})}"
259+
repo_list = _get_json(repo_list_url, session)
260+
if not isinstance(repo_list, list) or not repo_list:
261+
return []
262+
repo_row = repo_list[0]
263+
repo_id = repo_row.get("id") if isinstance(repo_row, dict) else None
264+
if not isinstance(repo_id, str):
265+
raise ToolFetchError(f"Tool Shed repository listing from {repo_list_url} missing string id", repo_list_url)
266+
267+
metadata_url = f"{toolshed_url}/api/repositories/{quote(repo_id, safe='')}/metadata?downloadable_only=true"
268+
ordered_url = (
269+
f"{toolshed_url}/api/repositories/get_ordered_installable_revisions"
270+
f"?{urlencode({'owner': owner, 'name': repo})}"
271+
)
272+
metadata = _get_json(metadata_url, session)
273+
ordered = _get_json(ordered_url, session)
274+
275+
if not isinstance(metadata, dict):
276+
return []
277+
order_index = {}
278+
if isinstance(ordered, list):
279+
for i, h in enumerate(ordered):
280+
if isinstance(h, str):
281+
order_index[h] = i
282+
283+
matches: List[ToolRevisionMatch] = []
284+
for key, meta in metadata.items():
285+
if not isinstance(meta, dict):
286+
continue
287+
colon = key.find(":")
288+
hash_ = key[colon + 1 :] if colon >= 0 else key
289+
tools = meta.get("tools")
290+
if not isinstance(tools, list):
291+
continue
292+
for t in tools:
293+
if not isinstance(t, dict) or t.get("id") != tool_id:
294+
continue
295+
raw_version = t.get("version")
296+
tv = raw_version if isinstance(raw_version, str) else ""
297+
if version is not None and tv != version:
298+
continue
299+
order = order_index.get(hash_, sys.maxsize)
300+
matches.append(ToolRevisionMatch(changeset_revision=hash_, tool_version=tv, order=order))
301+
break
302+
303+
matches.sort(key=lambda m: m.order)
304+
return matches
305+
306+
307+
def get_trs_tool_versions(
308+
toolshed_url: str,
309+
trs_tool_id: str,
310+
session: Optional[requests.Session] = None,
311+
) -> List[TRSToolVersion]:
312+
"""Fetch the list of TRS tool versions for ``trs_tool_id`` (``owner~repo~tool_id``).
313+
314+
Returns the raw server order — Tool Shed returns oldest first.
315+
"""
316+
url = f"{toolshed_url}/api/ga4gh/trs/v2/tools/{quote(trs_tool_id, safe='')}/versions"
317+
payload = _get_json(url, session)
318+
if not isinstance(payload, list):
319+
raise ToolFetchError(f"TRS versions response from {url} was not an array", url)
320+
try:
321+
return [TRSToolVersion.model_validate(item) for item in payload]
322+
except ValidationError as err:
323+
raise ToolFetchError(f"TRS versions response from {url} was malformed: {err}", url) from err
324+
325+
326+
def get_latest_trs_tool_version(
327+
toolshed_url: str,
328+
trs_tool_id: str,
329+
session: Optional[requests.Session] = None,
330+
) -> Optional[str]:
331+
versions = get_trs_tool_versions(toolshed_url, trs_tool_id, session=session)
332+
return versions[-1].id if versions else None

0 commit comments

Comments
 (0)