Categories
AWS

AWS: SAM gotchas

If your lambda is just returning before expected, check the default Timeout in template.yaml, most likely you will need a bigger number there

Commands:

sam init

In case you don’t know where to start, that will walk you thru the process and download a sample app for ya

sam build

(pack the latest changes, it will rerun anything new you put in requirements.txt as well

sam local invoke “YourFunctionName” -e events/yourevent.json

run your function locally, with your own event

sam deploy

put it out there

Categories
redshift Uncategorized

Redshift: unable to connect newly created instance

Problem: you just created a new instance, and even though you told it to be publicly available, you can’t connect to it using the provided endpoint….

Solution: you need to explicitly add your current ip address to the security group you are using. Their default security group is miss-leading: even though it says it will accept all traffic from everywhere, it doesn’t (sad panda). Once you add a new security group and attach it to the redshift instance, you will be fine fine fine.

Source of the solution: the infamous stackoverflow:

https://stackoverflow.com/questions/19842720/cant-connect-to-redshift-database

Categories
node.js

puppeteer: a node.js package to simulate browsers

It can be used for web scrapping as well.

Setting up one:

npm init --yes
npm install puppeteer

Create an index.js page, with the following:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://yoururlhere.com/');
  await page.screenshot({ path: 'takingasnapshotofyourpagehere.png' });

  await browser.close();
})();
Categories
express node.js

Express: the basics

On returning content

# automagically detects content and set the right headers:

res.send(returnhere)

# explicitly return json objects:

res.json(JSON.stringify(returnhere))

# you are more interested in sending http codes than actual return objects:

res.status(404).end();

Categories
mongodb

mongodb: the basics

# check existing collections:

show collections

# drop a collection:

db.collectionnamehere.drop()

Categories
Apache Spark

Apache Spark: the basics

RDD: Resilient Distributed Dataset

It’s encapsulation on a collection of data. Distributed in clusters automatically. RDDs are immutable.

You can apply transformations (return a new RDD with, for example, filtered data), and Actions (like First(), to return the first item in them

RDDs are resilient, they can lose nodes, and be able to recreate them automagically

Transformations in RDD are lazy loaded. For example, if we have lines that open a file, and then filter it, the opening of the file won’t happen right away. Spark reads the entire data set first, and determine to only save the filtered data, for example.

SparkContext

It is a connection to a computer cluster, used to build RDDs

Example of loading an RDD from external storage:

sc = SparkContext(“local”, “texfile”) # builds the context first
lines = sc.textFile(“in/uppercase.text”) # creates the RDD

Transformations

They do not mutate the current RDD, they return a new one.

filter() # returns elements passing the filter function

map() # applies the map function to each element on the original RDD and return the results in a new one

RDD actions

collect: puts the entire RDD in the driver program, usually to persist to disk. Memory intensive, make sure it is use in filtered, small datasets

count / countByValue: count number of rows, or number of unique values

take: return a subset of the RDD

saveAsTextFile: outputs to a storage in text mode

reduce: apply a lambda function to all the elements, two at a time, until we get a single result in return

persist: it will keep a copy of a produced RDD in memory, available fast for all nodes. You can pass the prefer storage level you like (DISK_ONLY, MEMORY_ONLY, etc) unpersist removes from the cache.

Categories
java

Java: the basics

Use int, long (primitives), instead of their objects (Integer, Long)

primitives are the atomic, basic, data types, unless you know what you are doing, stick to those.

They (primitives) are passed by value. Long and Integer are the object form of the primitives, to be avoided unless you need to pass by reference, or pass and make the code infer the type.

sample of using inline filters:
somelist.stream().anyMatch(s -> s.getId() == COMPARE_ID)

where s is a particular member of somelist, with getId() as a method, and we are just picking the one where id match COMPARE_ID in this case

Spring / autowire

  • when you see it on a class, it pretty much means “you are going to need one of these, and I am going to wire it for you”. Example:
public class SomeClass  {
...
private ChallengeManager challengeManager;
@Autowired
public void setChallengeManager(@Qualifier(SpringConstants.COMPONENT_CHALLENGE_MANAGER) ChallengeManager challengeManager) {
    this.challengeManager = challengeManager;
}

so when SomeClass gets spawned, it will automagically include the class marked by @Autowired

the @Qualifier(SOMECONSTANT) is to ensure it is the class you want to autowire

in complex systems, there may be more than one ChallengeManager, so that qualifier and constant will make sure we are auto wiring the right one

Throwing and catching

if an interface is marked as “throws”, there should be a try catch that deal with the specified throwable exception in there

but if you want to make the parent calls of the class / method to deal with it instead, you can just add “throws ExceptionName” in the signature

Categories
AWS Glue

AWS Glue: the basics

  1. Crawl your data source first
    1. to create a catalog
    2. and table definitions
  2. Add a job to process your crawled data

That’s all!

Categories
react.js

React.js: the basics

Basic element creation:

ReactDOM.render(React.createElement('h1', null, 'Hello world!'),document.getElementById('content'))

The first argument, the element
The second: the data to be feed to that element
The third, the innerHTML inside that element

ReactDOM.render does the actual appending to the page



React Hooks
Example (look ma' no classes!):

const GeneralStats = () => {

useEffect(() => {
     // fetch your data or whatever you did on react before hooks here, useEffect is similar to componentdidmount
});

    return (
        <div className="Home">
            Please wait, loading ... 
        </div>
    );

}


export default GeneralStats;

Categories
redshift

Redshift: alter table column TYPE is not allowed

Only allowed for varchar column types.

The trick to get it done:

ALTER TABLE sometable ADD COLUMN some_new_column (with the new definition you want)
UPDATE sometable SET some_new_column = old_column;
ALTER TABLE sometable DROP COLUMN old_column;
ALTER TABLE sometable RENAME COLUMN some_new_column TO old_column;

The catch: column order will be altered (the new column will be the last now

If you use copy to fill out that table, you can’t reorder columns to make it fit still

If that is your setup, instead of create a new column, create a new table with the right TYPE, and do as above