Filecatalog crawler

General

This crawler uses the desy federation website (http://federation.desy.de/fed/lhcb/) as a base to compare the filecatalog with the data saved on the storage elements. If a file is shown on the federation website to be available on a specific SE the crawler will check if the catalog agrees. If not the catalog is missing information. However the crawler only cares that files that are on the federation are also in the crawler, but not vice versa. For now the crawler is a standalone agent. Proper DIRAC agent behaviour has to be implemented in order to work within the DIRAC framework.

Usage

To start the agent initialize the catalog agent and initialize.

CA = catalogAgent()
  gLogger.setLevel("NOTICE")
  res = CA.initialize()

Following two parameters are in the initializer that can be changed:

self.rootURL = 'http://federation.desy.de/fed/lhcb/LHCb/Collision10'
self.dedicatedSE = [] #['CNAF_M-DST','IN2P3_M-DST','CERN-USER']

self.rootURL: Starting point of the crawl. Once the rootURL is reached the crawl has finished self.dedicatedSE: Specify a list of storage elements (SE) which the crawler should compare to. If we check only for SE X and Ia file is shown on the federation website to be available on SE X but the catalog only shows Y and Z then this file will be shown as missing on X. If the federation shows files to be on Y and Z it doesn't check whether they are in the catalogs under Y and Z.

To be done

For now the crawler saves the results in a pickle file and this only after it has finished crawling. This should be improved that the crawl will be dumped in specific intervalls (i.e. after ˜5000 files have been checked). After each intervall checked it could either:

read the pickle dump into a dictionary
update the dictionary with the new results
dump the updated dictionary

or

dump the dictionary with the x files result in a new pickle file

The first method could suffer from long dumping times because it rereads the pickle dump (you cannot add to an existing pickle). Method two will possibly create many pickle files. A database approach is also possible.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.gitignore		.gitignore
README.md		README.md
dirac.cfg		dirac.cfg
read_Federation.py		read_Federation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Filecatalog crawler

General

Usage

To be done

About

Uh oh!

Releases

Packages

Languages

fibbo/readFederation

Folders and files

Latest commit

History

Repository files navigation

Filecatalog crawler

General

Usage

To be done

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages