• Welcome to Religious Forums, a friendly forum to discuss all religions in a friendly surrounding.

    Your voice is missing! You will need to register to get access to the following site features:
    • Reply to discussions and create your own threads.
    • Our modern chat room. No add-ons or extensions required, just login and start chatting!
    • Access to private conversations with other members.

    We hope to see you as a part of our community soon!

Python question

Meow Mix

Chatte Féministe
Well I've posted elsewhere but I might as well try all my options, someone here was helpful last time when StackExchange was not.

I do not know Python well. I'm piecing together code based on examples I've seen elsewhere, and I've only trained in Python for making plots and things like this. So bear with me as much as possible.

I have data from the CANDELs surveys with some 180,000 galaxies in five different fields (UDS, GOODS-S, GOODS-N, COSMOS, etc.)

Unfortunately for me, the Hubble teams clearly compiled data separately because what they've sent me comes in the form of catalogues with differently named columns. I'm interested in making a master file for all five fields where I can get the median mass calculated by the Hodges-Lehmann estimator in the linear space considering only estimates with the same assumptions for IMF and stellar templates.

But in one catalogue (say UDS) this might be called M_med, while in another catalogue this might be called M_med_HL.

So I've written code that will make a combined csv if the columns are all named the same. But they're not: two of the fields have this one column named differently.

Additionally, two of the other fields don't have the values of this column on a logarithmic scale (I basically just need to log_10 the column).

So all in all, I need to change the column name of two of the source docs, and put the column in a logarithmic scale in two of the other source docs, then I will have everything in place to combine them into one master doc.

I've considered just copying the files and going in to the source file and fixing the column names, but they're "security catalogue" files and I have no idea how to open these except using Python.

This is what I'm going to combine things with:

[GALLERY=media, 9578]Pythmass by Meow Mix posted Aug 11, 2021 at 2:22 AM[/GALLERY]
 

Meow Mix

Chatte Féministe
Actually as I typed this I had an idea, I can also open this filetype as ASCII in a program called Topcat, but column names can't be edited. I can create new columns and just copy the data over, and actually, I should also be able to put them on a logarithmic scale and then resave them as a new catalogue (it'd just have some extra columns, but I don't think Python cares if I specify only the ones I want).

This might be a moot point. I'm going to try that, but I'm leaving this up in case it doesn't work.
 

lewisnotmiller

Grand Hat
Staff member
Premium Member
Well I've posted elsewhere but I might as well try all my options, someone here was helpful last time when StackExchange was not.

I do not know Python well. I'm piecing together code based on examples I've seen elsewhere, and I've only trained in Python for making plots and things like this. So bear with me as much as possible.

I have data from the CANDELs surveys with some 180,000 galaxies in five different fields (UDS, GOODS-S, GOODS-N, COSMOS, etc.)

Unfortunately for me, the Hubble teams clearly compiled data separately because what they've sent me comes in the form of catalogues with differently named columns. I'm interested in making a master file for all five fields where I can get the median mass calculated by the Hodges-Lehmann estimator in the linear space considering only estimates with the same assumptions for IMF and stellar templates.

But in one catalogue (say UDS) this might be called M_med, while in another catalogue this might be called M_med_HL.

So I've written code that will make a combined csv if the columns are all named the same. But they're not: two of the fields have this one column named differently.

Additionally, two of the other fields don't have the values of this column on a logarithmic scale (I basically just need to log_10 the column).

So all in all, I need to change the column name of two of the source docs, and put the column in a logarithmic scale in two of the other source docs, then I will have everything in place to combine them into one master doc.

I've considered just copying the files and going in to the source file and fixing the column names, but they're "security catalogue" files and I have no idea how to open these except using Python.

This is what I'm going to combine things with:

[GALLERY=media, 9578]Pythmass by Meow Mix posted Aug 11, 2021 at 2:22 AM[/GALLERY]

You can't just dump the files into some sort of SQL database and do a simple UNION ALL?

Sorry, don't know Python at all. But I've encountered this type of problem plenty of times in PL/SQL (or straight SQL) and if the data types of the films are the same it's dead easy.

Might be misunderstanding what you're doing though.
 

Daemon Sophic

Avatar in flux
Hmmmm not Monty then.....

g3a5.gif


1vns.gif
 

Yerda

Veteran Member
Have you used the pandas library before?

I ask because I don't really know how to solve the problem but have played around with dataframes in pandas and merging things and renaming columns was straightforward. This answer from stack overflow shows what I mean:

Merge two data-sets in Python Pandas

Hope this is helpful. I'm a total beginner with python and programming in general.
 

HonestJoe

Well-Known Member
So all in all, I need to change the column name of two of the source docs, and put the column in a logarithmic scale in two of the other source docs, then I will have everything in place to combine them into one master doc.
I don't know Python specifically but in principle, my instinct would be the manipulate the data within your code. After you've loaded your datasets with their varied column headings, you copy the data to new, different data objects with either generic column headings or none at all. Then you can merge the data and output it with whatever column headings you want for your combined output file.

It will depend on your specific requirements and tools whether it is actually easier to change the source files or do it in the code as I described but the latter should be possible somehow.
 

Meow Mix

Chatte Féministe
Is this the same kind of data that can be found here (referenced from archive home page)? I was just trying to get a feel for what the various data files look like.

Those look like .fits files, these are galaxy images/cubes. I have those too, but the security catalogues are data on redshift, mass, spectrometry, morphology, etc.
 

ohiogrown

New Member
Those look like .fits files, these are galaxy images/cubes. I have those too, but the security catalogues are data on redshift, mass, spectrometry, morphology, etc.

Are you using Windows? The reason I ask is that Windows thinks a file ending with ".cat" is a "security catalog", which is actually something completely different. If you renamed the files to add .txt, you probably could look at them with notepad (assuming it didn't balk at such large files).

If I was doing this on a linux system, I'd probably try to use the "cut" command to extract the columns I was interested in from each file. I think that's similar to what you're attempting to do with the "topcat" program.
 

Meow Mix

Chatte Féministe
Are you using Windows? The reason I ask is that Windows thinks a file ending with ".cat" is a "security catalog", which is actually something completely different. If you renamed the files to add .txt, you probably could look at them with notepad (assuming it didn't balk at such large files).

If I was doing this on a linux system, I'd probably try to use the "cut" command to extract the columns I was interested in from each file. I think that's similar to what you're attempting to do with the "topcat" program.

I should have posted again to let people know: Topcat method did work, I just made new columns, put the ones needed through log10, unchecked the old columns, and then used Topcat to merge them (didn't even need to run them back through Python) and saved as csv. Now I can load in Python just fine. Probably a long way to do things but it got the job done
 
Top