Skip to content

bug fix for Relab library: Sometimes data files don't have description#182

Open
kvngvikram wants to merge 3 commits intospectralpython:developfrom
kvngvikram:develop
Open

bug fix for Relab library: Sometimes data files don't have description#182
kvngvikram wants to merge 3 commits intospectralpython:developfrom
kvngvikram:develop

Conversation

@kvngvikram
Copy link

Hi,

The Relab spectral database sometimes doesn't have description in their data files (.asc files). This can be any number of lines before the last lines of Source info and Date lines.
And I guess it is arbitary. Some have and some don't.

In the current state for files with no description, creating database results in an error because there is no s.sample['description'] which is necessary.

So this pull request adds a test spectra that doesn't have the description along with the existing one.
And the code was edited so that anything lines that do not have 'Source' or 'Date' are appended as description. And even if there is no description, the s.measurement['name'] will be used (as intended previously).

Note:
Here is the error message for the newer test files but older code:

$ python -m spectral.tests.database
------------------------------------------------------------------------
Running database tests.
------------------------------------------------------------------------
Testing create_database..................................... OK
Testing create_envi_lib..................................... OK
Testing read_signatures..................................... OK
Traceback (most recent call last):
  File "/home/happy/Desktop/spectral/spectral/database/relab.py", line 317, in _import_files
    sampleNum, s['owner'], s['origin'], phase, s['description'])
                                               ~^^^^^^^^^^^^^^^
KeyError: 'description'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/happy/Desktop/spectral/spectral/tests/database.py", line 210, in <module>
    run()
    ~~~^^
  File "/home/happy/Desktop/spectral/spectral/tests/database.py", line 201, in run
    T().run()
    ~~~~~~~^^
  File "/home/happy/Desktop/spectral/spectral/tests/spytest.py", line 57, in run
    method()
    ~~~~~~^^
  File "/home/happy/Desktop/spectral/spectral/tests/database.py", line 90, in test_create_database
    db = spy.RelabDatabase.create(RELAB_DB, RELAB_DATA_DIR)
  File "/home/happy/Desktop/spectral/spectral/database/relab.py", line 246, in create
    db._import_files(relab_data_dir)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/home/happy/Desktop/spectral/spectral/database/relab.py", line 319, in _import_files
    raise Exception ('Error creating IDD')
Exception: Error creating IDD
Testing create_database.....................................

@kvngvikram kvngvikram closed this Feb 20, 2026
@tboggs
Copy link
Member

tboggs commented Feb 20, 2026

Apologies, I've been away on other projects for a while. If this commit is still relevant, let me know and I will merge.

@kvngvikram
Copy link
Author

Hi @tboggs,

Yes the issue still persists. But I found another bug with another file. So I wanted to wait until I am confident.
I was also able to clarify some points with the maintainers of RELAB from Brown University.

So here is the summary of the discussion regarding the format of asc files.

ASC files structured as:
Number of data lines
Data lines (Wavelength in nm, Reflectance, and SD if any)
Three blank lines
File name
A blank line
Sample ID
Comment lines

Now, the current issues

  1. The current code fails when there are no comment lines (saved as s['description']). This PR was trying to solve that.

  2. When a file has the word Volatite in the comment lines. Volatite is describing a particular property of sample. Now issue comes because many files have the following last two lines

 Source Ang: 30.00   Detect Ang:  0.00  Volt: 0.777
 Date: 25-JUL-12   Time: 17:11:03

Now the current code confuses the word Volatite with Volt.

I was only able to find these two because I was working with the entire database instead of downloading fewer files.

Here are some more info I got from the maintainer:

Comment lines do often include important information such as angles and date, but they are not always consistently there.

Because we have several different spectrometers used and software used went through revisions and downtime when we had to take manual conversion from the raw data files to the publishable files, there are some inconsistencies on what are included in what way in the comment lines.

Incidence, emergence, and phase angles, and dates are available in our Spectra_Catalogue file.

Regarding the last two lines:

Those two lines including the angles and date are most likely there because they were produced by our program usually used as part of data conversion.
However, we have different types of spectral data which may not go through that particular conversion.

Regarding some comment lines starting with '*' character (like '#' in python code)

* is probably part of the raw data file, part of which may be carried to the final spectral files when the conversion process was done manually or in a special way.

Now....

The two points about no description/comment lines, and handling the Volt: correctly, has to be fixed.
This can be done by fixing just the function read_relab_file.

I want two more spectra added to the tests (picking these from the complete database I downloaded).

But, I also feel that the structure of the function doesn't exactly assume the given .asc file structure.
For example, using the number of (x, y) from the first line instead of waiting for a blank line. And then skipping exactly 3 lines instead of any arbitrary number of line. Expect the Source Angle line as the last two line if they exist.
Of course these are subjective opinions. So I up for discussion.

Is it ok if I do a different PR ? Should I stick to solving the two issues or more?

@tboggs
Copy link
Member

tboggs commented Feb 23, 2026

Thanks for the detailed explanation. I'm fine with two patches but if one is easier, that works as well.

@kvngvikram kvngvikram reopened this Feb 28, 2026
@kvngvikram
Copy link
Author

Hi,

I have changed the read_relab_file function, and added test files. Hope they are not too big.

Key points of this PR:

Most of the complication in the function is because we are trying to parse the comment blocks. They do not have good consistancy. Even the last two lines. The maintainer himself suggested to use the catalogue files for this kind of information.

Previously Relab ID is taken as the first line of the file. This is a mistake as now we know that the first line is just the number of datapoints and not some ID.

I tried to keep other metadata consistant to before even though I feel they are not right.

My comments and opinions (not confined to this PR):

Giving you the context of the database, there are two IDs of relab. The sample ID and the spectrum ID. It is possible to have multiple spectra (different spectrometers, angles) for same sample. Sample catalogue has info on properties of sample like collection location, composition, etc. While Spectrum catalogue has info on spectrometer range, angles etc.

Ideally a database should have two table schemas, one each to store info of spectrum catalogues and sample catalogue. And each entry in spectrum table has data, spectum ID (the .asc filename itself), and the corresponding SampleID (so that sample info in first table can be used). This means refactoring the entire relab.py with different table schemas. And then attempting to read both catalogue files if the user wants. Or else just read the few/all .asc files into spectrum table.

Well, for now I am limiting myself to this PR.

Hope this helps,
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants